





7 



MSB 



ED 067 958 



DOCUMENT RESUME 
48 



FL 003 661 



AUTHOR 

TITLE 

INSTITUTION 
SPONS AGENCY 

BUREAU NO 
PUB DATE 
CONTRACT 
NOTE 



Hutchins, John A. 

An Investigation of Spoken Brazilian Portuguese: Part 
I, Technical Report. Final Report. 

Naval Inst., Annapolis, Md. 

Institute of International Studies (DHEW/OE) , 
Washington, D.C. 

BR-8-0130 
Aug 72 

OEC-0-8- 00013 0-3543-0 14 
79p. 



EDRS PRICE 
DESCRIPTORS 



MF-$0 .65 HC-$3. 29 

Computational Linguistics; *Computers; Data Bases; 
Educational Experiments; ^Language Research; Modern 
Languages; Optical Scanners; ^Portuguese; Romance 
Languages; Speech; ^Syntax; *Word Frequency 



ABSTRACT ) 

This final report of a study which developed a 
working corpus of spoken and written Portuguese from which 
syntactical studies could be conducted includes computer-processed 
data on which the findings and analysis are based. A data base, 
obtained by taping some 487 conversations between Brazil and the 
United States, serves as the corpus from which a frequency list of 
some 2,000 words is derived. A print-out of the Key-Word-in-Context 
is also developed and intended for use by linguistic researchers. 
Descriptions of experimental procedures, findings, and 
recommendations are included. Supportive technical data and 
experimental information are appended. (RL) 
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PREFACE 



This final report is divided into two parts. Part I is 
the descriptive background, the frequency lists for both 
the spoken and the written corpus, and the most important 
findings together with recommendations. Part II is a study 
(a doctoral dissertation) by Clea Rameh entitled, "Toward a 
Computerized Syntactic Analysis of Portuguese." This study 
was based on a segment of the spoken corpus. 

The authors are fully aware that only the highlights 
and only a preliminary analysis of the linguistic phenomena 
of spoken Brazilian Portuguese can be presented here. How- 
ever, we have captured an excellent sampling of the spoken 
language, processed it, and made it available to others. 

Even from a time span, it should have value in the years to 
come since easy retrieval of its elements is possible in a 
number of different ways. 

The frequency lists, computer programs, and the pro- 
tocol will be found in the appendix. It is hoped that 
microfiche and/or microfilm copies of the corrected Key- 
Word-in-Context lists will shortly be available for 
interested scholars. Also, it may be possible for the ,Naval 
Academy to put the lists "on line" in our Time-Sharing 
system so that users may access them over long-distance 
telephone lines. It is obvious that there is much to be 
done and that this project only represents a step in the 
direction of analysis of the spoken language. 

There were many individuals who were instrumental in 
helping us over the rough spots as well as making sub- 
stantial contributions. William L. Higgins, former project 
officer, accompanied every detail of the project and was un- 
tiring in his effort to see that the project was successful. 
Commander R.T.E. Bowler, Jr., USN (Ret.), provided us with 
an efficient and smooth fiscal operation through the U.S. 
Naval Institute. Professor John D. Yarbro, Chairman of the 
Naval Academy's Area-Languages Studies Department guided us 
through the difficult phase of proposal writing and setting 
up the project at the Naval Academy. Dr. James Nielson of 
the National Security Agency was most helpful with 
suggestions as to procedure as were Charles Holt and Marvin 
Peacock in the computer programming. In this same field, 

Dr. Harold Kaplan, Professor of Mathematics at the Naval 
Academy, designed some highly sophisticated computer pro- 
grams which provide for upper and lower case, various 
accent marks, and other characters to be alphabetized in 
any order whatsoever once the order has been so indicated. 



Yara Telles transcribed the recordings that produced our 
spoken corpus and prepared both manuscripts for optical 
scanning. Finally, my two colleagues, Dr. Guy J. Riccio, 
Head of the Spanish-Portuguese Division of the U.S. Naval 
Academy and Dr. Clea Rameh, of Georgetown University's 
School of Languages and Linguistics, because of their 
dedication and effort, deserve much of the credit for the 
successful conclusion of this undertaking. 
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INTRODUCTION 



I. Problems under consideration. 



Portuguese, in spite of being one of the most widely 
spoken languages in the world, has long been neglected by 
linguists and researchers. In Brazil alone almost one 
hundred million persons speak Portuguese. This study is 
mainly concerned with the spoken language of Brazil. And 
as Brazil increases in importance so does its language. 

Traditionally most linguistic research has been concerned 
with aspects of the written language. A glance at the titles 
of the many doctoral dissertations shows how few deal with 
the spoken language in spite of the fact that from ninety to 
ninety-five percent of all communication is transmitted in an 
oral form. The difficulty has always been that of obtaining 
a suitable spoken corpus and converting it to a non-volatile 
state so that the linguistic elements will be available for 
investigation. 

Up to now no analysis of spoken Brazilian Portuguese has 
been attempted by scientific, empirical methods. Serious 
technical problems are probably responsible for the few 
studies completed and these are only in English, French, and 
German. According to H.A. Gleason - "A written language is 
typically a reflection, independent in only limited ways, of 
the spoken language. As a picture of actual speech, it is 
inevitably imperfect and incompete."! Gleason adds later 
that "Linguistics must start with thorough investigation of 
spoken language before it proceeds to study written language." 

Previous studies. 



The only comprehensive study of Brazilian speech is that 
of Professor Earl Thomas of Vanderbilt, The Syntax of Spoken 
Brazilian Portuguese . ^ Based on a collection of notes taken 
over a twenty-year period, Professor Thomas was able to pre- 
sent a concise analysis based on his subject data. His monu- 
mental work is extremely important to scholars in Portuguese 
and must be taken into serious consideration. Professor 
Thomas did not, however, use a data base of tape recordings 
and statistical compilations to arrive at his conclusions. 

It is perhaps at this point that this study hopes to make a 
contribution. 



•• Fred Ellison, at the University of Texas, started to 
collect and has a small corpus of spoken Brazilian Portu- 
guese - a corpus which was obtained from directed, tape- 
recorded interviews with Brazilian exchange students. The 
interviews have been transcribed and apparently are, avail- 
able to scholars. Seemingly, such interviews would- not 
fall in the category of free, natural language. The fact 
that those being interviewed knew "they were being recorded 
probably would have inhibited them in some manner. Then 
there is also the thought that directed interviews might 
produce responses that lacked authenticity in natural speech 
patterns . 

An attempt to make a survey of the spoken language of 
Brazil, building a corpus of over a million words of spoken 
Brazilian Portuguese, by a group headed by Professor Adriano 
Kury of the University of Brasilia, was given up because of 
a lack of funds. While describing a research project that 
he was conducting on a syntactical analysis of Brazilian 
Portuguese, Professor Henry Hoge, Florida State University, 
told me that he chose the works of 27 different authors 
because of their colloquial style. He divided his corpus 
into two groups - one narrative and the other spoken or dia- 
logue. Professor Hoge stated that "in no case was it spoken 
language but (it) was a representation," adding that "not 
much could be done with the spoken language. Samples from 
the 27 authors selected by Hoge form the data base for our 
literary Brazilian Portuguese corpus. 

This author contends that the "representations" of 
spoken Portuguese, or any other language for that matter, 
when found in a written form, will vary considerably from 
the actual spoken language. Charles F. Hockett felt that 
"no writing system has ever provided for the graphic repre- 
sentation of everything that counts (morphemically or phone- 
mically) in speech" . . . "We do not write English as we 
speak it. "5 And in Computational Analysis of Present-day 
American English , Mary Lois Marchworth and Laura M. Bell 
states "It would be well to remember that in the fiction 
selections of the Corpus, dialogue represents artistic 
rather than actual rendering of the spoken language. 6 

Written or literary Portuguese has been studied for 
some time. Back in 1945 Charles B. Brown, Wesley M. Carr, 
and Milton L. Shane compiled a Graded Word Book of Brazilian 
Portuguese . An "eyeball" count of 1,200,000 running words 
produced a total of some 26,278 different words. Of these 
different words only 9,345 were found in five or more 
different sources. The data base was extracted from prose, 
drama, newspapers, and technical journals. To simplify the 
operation, the most common (in their judgement) 222 words 
were excluded from the counting. 7 
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In 1951 Charles Brown and Milton Shane published their 
Brazilian Portuguese Idiom List , containing some 3,500 items. 
Interestingly enough, the authors felt that "only in prose 
can one find the bases for normal idiomatic usage."S in 
their selections from novels and children's books, they 
examined only pages containing dialogue, claiming that "with 
this procedure it may be said that more than 50 per cent of 
our materials represent conversation, that is, to the extent 
that the printed page reflects this usage. "9 

More recently Professor John R. Kelly (Santa Barbara) 
developed a list of the five hundred most common words he 
found in a 127,000 word corpus taken from Brazilian novels, 
periodicals, and newspapers. Using computational methods, 
Kelly was able to include even the most common words in his 
count. 10 Also, at Stanford, John C. Duncan produced a fre- 
quency list derived from selections of continental Portu- 
guese written between 1918 and 1939. H 

Objective . 

Our objective was to develop a working corpus, first of 
spoken and then written Portuguese from which syntactical 
studies could be conducted. The original objective was 
limited to establishing a list by frequency of occurrence of 
the 2,000 most commonly used words in spoken Brazilian Portu- 
guese. As the project progressed, it became evident that in 
addition to the frequency lists, and with some increased 
effort, we could obtain magnificent print-outs of the Key- 
Word-in-Context (KWIC) for both corpora, lists which would 
have immense value for in-depth linguistic studies. 

Methods . 

There is really no ideal way to capture the spoken 
language, the uninhibited, natural, and spontaneous language. 
Various experiments and attempts have been made using such 
techniques as hidden microphones, direct interrogation, and 
recording radio broadcasts - all with less than satisfactory 
results. ' Because of this a unique method was developed, 
which, while it may have certain shortcomings, does give an 
almost natural speech pattern between two natives in a real 
situation in which actual information is exchanged. 

The data base was obtained by taping "phone-patch" con- 
versations between Brazil and the ’United States, dialogues 
transmitted on Amateur frequencies which are in the public 
domain and which can be monitored by anyone. All these con- 
versations were recorded without the knowledge of the parti- 
cipants. One factor that detracted from the naturalness of 
the dialogues v/as that it was necessary to say "cambio" or 
"over." Hov/ever, this was more than compensated by the fact 
that no conversations v/ere superimposed on others - that is, 



there were never two people talking at the same time. A per- 
fect identification could be established at all times as to 
who was speaking. The participants were so intensely 
interested in the subjects being discussed that they paid 
little attention to the medium being used or to their manner 
Of speaking. Each conversation represented a unique oppor- 
tunity to speak to a loved one or friend some five thousand 
miles away. In summation, it is quite remarkable to listen 
to the tapes and note the naturalness in the way the 
"informants" talk to each other. 

Actual recording began in July 1967, being supported 
through a small grant from the U.S. Naval Academy Research 
Council. In the spring of 1968 a contract was negotiated 
between the U.S. Office of Education and the U.S. Naval In- 
stitute, acting in behalf of the Naval Academy, to broaden 
the scope of the original project so that meaningful results 
could be obtained concerning the speech of Brazil. 

In processing the data for computer input, certain in- 
novations were tried and proven successful. It became 
apparent that it would be quite feasible to handle a large 
size literary corpus of Brazilian Portuguese, using the same 
methods with only slight changes. A revision to the original 
contract provided for the processing of 10,000 word segments 
each from 27 contemporary Brazilian authors. The list of 
authors, titles, and selections is included in the Appendix. 

A few changes were incorporated in the protocol in order to 
transcribe the original texts as closely as possible. 

Descrintion. 



Some 487 different conversations between native 
Brazilians were recorded on 142 reels of 600 foot tape and 
34 reels of 1,200 foot magnetic tape, 7 1/2 ips., full track. 
Since the frequency range was from 500 to 3,000 cycles, no 
provisons have been made for using these tapes for phono- 
logical studies. There are a total of 837 persons, 367 men 
and 470 women of which at least fifteen have since passed 
away. About three- fourths of the men could be identified by 
profession, this being much more difficult for the women. 

Recording was done on a random basis without a pre- 
determined plan. A scientific sampling such as described by 
Leslie Kish, 12 could hardly be justified because of the 
tremendous difficulties, effort, and expense involved. The 
problem of obtaining speech samples from the many vast areas 
of the country is itself a formidable undertaking. 

We were most fortunate in having the majority of our 
conversations voiced by persons living in or having, come 
from Rio de Janeiro. The car ioca accent, as Earl Thomas 
points out, 13 is the prestige speech of Brazil and carries 
with it the mystique of the cidade maravilhosa . Also, with 



nationwide television programs originating in Rio do Janeiro, 
this accent is rapidly becoming the standard of the country. 
To a lesser extent we do have many dialogues originating in 
Sao Paulo, P3rto Alegre, Curitiba, Vitoria, Bahia, Recife, 
Belo Horizonte, Fortaleza, and other parts of Brazil. Most 
of the "informants" belong to the upper-middle class, have 
traveled extensively, and seldom reside in the place of their 
birth. Only a fraction show any regional speech charac- 
teristics. In comparision with a literary corpus, which we 
also processed, there is relatively little slang and few 
taboo words are present. 

• Many of the dialogues deal with travel, state of health, 
and lack of correspondence. There are several interesting 
little stories, such as: an air-rescue search in the Amazon, 
a conversation between a beauty queen and her fiance, a 
description of serum for organ transplants, purchases of 
furniture, arrangements for scholarships, and a request for 
articles for "macumba" sessions. There are also discussions 
of arrangements for international conferences, internal 
administration of governmental organizations, arrangements 
for transmission of radio broadcasts, weather, various des- 
criptions of American schools, several informal conversations 
("bate-papos" ) , automobile accidents, deaths and funeral 
arrangements, discussions of legal affairs, military service, 
births, sickness and operations. A few of the dialogues 
refer to football games and to the purchase of clothing as 
well as to getting goods through customs and shipping 
articles to Brazil. There are several exchanges from Rio 
Grande do Sul which have the tu form of the verb. In 
general, the conversations contain samples of many of the 
common subjects that are usually discussed in the language. 

Transcription of recordings. 

In transcribing the recordings, every effort was made 
to reproduce as accurately as possible what was actually 
said. It was decided to recognize as words abbreviated 
forms that are frequent in speech. Examples are: c|, n<e, tou , 
ta , and others. Because of obvious limitations in the fre- 
quency range of the reproductions, we did not attempt to 
determine the existence or omission of final s^ or _z. The 
final "hard copy" was corrected by again listening to the 
original recording. All pauses, hesitations, repetitions, 
and false starts were included and unintelligible sounds 
were so indicated. 

Protocol. 



Before any data could be processed for computer input, 
it was necessary to select which symbols we would use for 
accent marks and also to determine how many items would be 
coded for subsequent identification. At first, we thought 



it would be useful to tag verbs and parts of speech. We had 
the problem of homographs as well as certain typing pro- 
cedures necessary for optical scanning. 

After considerable time and effort, v/e completed an 
eight page protocol , which, in our opinion, gave us a maximum 
of options and required a minimum of extra effort in typing 
the final hard copy. For example, v/e selected the apostrophe 
for the acute accent mark since the apostrophe is fairly 
similar in appearance. The acute accent mark is also the 
most common and for this reason we chose a lov/er-case keyboard 
character. For c cedilla (Q) we used the comma, positioned 
after the C. All symbols for diacritics, in spite of practice 
to the contrary, were placed after the letter involved so that 
in the print-out, words having them would be fairly close to 
where they would normally fall in alphabetical order. 

Typing for optical scanning . 

Probably the most. difficult, most expensive, and time- 
consuming part of the project was the data preparation of 
the corpus for computer input. A keypunch card operation 
would have been prohibitive in cost and it also presented 
obstacles difficult to surmount. On investigation v/e found 
that we could type for optical scanning without too much 
difficulty. An IBM Selectric typev/riter (10 pitch) and the 
12L2/12F2 typing element were all that v/as necessary. Data 
preparation for scanning is considered to be about 30 per cent 
faster than keypunching. Scanning, depending on the amount 
of material, takes about six to ten seconds a page and the 
cost, on a contract basis, is roughly forty dollars an hour. 
The average typist is capable of performing about 10,000 
strokes an hour while a good operator can keypunch from 
7,500 to 8,000 characters for the same period of time. 

Our task v/as to produce a typewritten manuscript for 
optical character reading with as few errors as possible. We 
did not have a correction program easily available since v/e 
were located at some distance from the processing center. 

A.fter making trials and test samples of computer print-outs 
to correct program errors, we attempted to proceed with 
typing the manuscript. On almost every page we would have 
one or two errors v/hich required retyping the entire page. 

This was extremely discouraging since, when we corrected one 
error, v/e v/ould often introduce others so that new and com- 
plete proofreadings were necessary. The real breakthrough 
came when the Office of Education granted us permission to 
rent an IBM Magnetic Tape Selectric Typewriter (MT/ST) . Now 
we could produce a magnetic tape as we typed, correcting as 
we' went, by backspacing. Upon completion of the proof- 
readings of the semifinal copies, the errors were corrected 
by going to the reference codes on the magnetic tape and 
then skipping lines and characters to arrive at the points 
where the corrections were to be made. 
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With the use of the MT/ST, our production increased 
considerably as did our accuracy. > In our final computer 
print-out of our spoken corpus, we had about 1,000 errors in 
the 400,000 words typed, or roughly about a quarter of one 
percent. This rate by any calculation is extremely good. 
There were about 140 errors stemming from uho process of 
optical scanning, coming either from faulty typewriter 
adjustment or the inherent characteristics of the scanner. 
The most 
of an 0, 

four occasions the scanner jumped the last four or five 
lines of a paragraph and once omitted almost an entire page. 



common errors were those of producing a Q instead 
a K instead of an R, and -an N instead of an M. On 



This experience with our spoken corpus gave us enough 
confidence to attempt a similar processing of our literary 
manuscript. This experience were extremely frustrating. 

We suffered all types of problems - the blobs intended for 
spaces had to be blotted out by hand and the number of 
errors was incrediably large. Also, the errors in the text 
titles caused considerable problems. Correcting the errors 
in the literary corpus v/as a slow and painfull process. In 
the section under recommendations we are advocating a new, 
improved manner of computer input of linguistic data. 



For our spoken corpus there were some seventy errors in 
informant numbers. This v/as a fairly serious situation 
since much of the value of informant numbers revolves around 
their being used to establish the range of how many different 
speakers used each individual word of the corpus. Each in- 
formant error was multiplied by the number of times that 
each and every word v/as used in the passage. On the average 
there v/ere about eight informant numbers on each page, 
resulting in an approximate total of 7,300 errors alone from 
this source. Also, v/ith the print-out in front of us, it 
was possible to catch many of the things v/e had permitted to 
slip by unnoticed. 



Instead of using the large computer print-out to search 
for errors, v/e produced duplicate Xerox copies of the 15,000 
page index for both corpora. Each corpora is now bound in 
37 hard back pressure binders, which, with the five volumes 
of original typed transcriptions, are easily stored and 
readily available for consultation. 



FINDINGS AND ANALYSIS 



1 . Description of materials produced. 



As in the case of any new computer program, it. v/as 
necessary to have three test runs on a small portion of the 
material to be processed. The final test run served as a 
data base for a computerized syntax analysis and is found in 
Part II of this report, "Toward a Computerized Syntactical 
A.nalysis of Portuguese." In this way we were able to spot 
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oversights and change certain features in our concordance 
program. With almost no facilities available to begin the 
project, we wore most fortunate in being able to have the 
optical scanning and computer processing done for us by an 
agency of the Department of Defense. 

The following wore completed: 

1. Key-Word-in-Context computer program (in BASIC), 120 
characters, with frequency count, upper and lower case, 
accent marks, and informant numbers on the right margin. See 
appendix. 

2. Frequency lists in descending order of' occurrence as 
well as alphabetical order for both corpora. Computer 
programs for both. 

3. Typewritten manuscripts and input tapes for both corpora. 

4. Computer output tapes of the KWIC of each corpora. 

5. Xerox copies of the KWIC index of each corpora (15,000 
pages each), on deposit at the U.S. Naval Academy and at the 
School of Languages and Linguistics of Georgetov/n University. 

6. Report by Dr. Clea Rameh, co-investigator, entitled, 
"Toward a Computerized Syntactical Analysis of Portuguese," 
available through University Microfilms, A.nn Arbor, Michigan. 

7. Computer program for reverse concordance, input verifi- 
cation program, and a tape cartridge reading program for 
computer inputing from the IBM magnetic tape typewriter. 

8. Translation table for alphabetizing all characters from 
the IBM 2741 typewriter computer terminal. This program 
will sort for all languages. 

Our prelininary findings were based on evaluations from 
the KWIC index which is indispensable for any scientific 
analysis of the language. With all the examples of the word 
is question coming in the center of the page in its natural 
surrounding, this KWIC index provides a very rapid and 
efficient manner in which authentic examples of the use of 
the target language can be readily found for investigation. 
(See examples of the KWIC index in the Appendix.) 

In the 400,000 word spoken language corpus, after 
deducting hesitations, false starts, pauses, repetitions, 
and errors in syntax, about 12,000 different words were 
found. In contrast, our literary corpus, which was about 
the same size, had 29,375 different words. Thus, our 
spoken corpus had a vocabulary range of only about forty 
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percent of that of the literary corpus. This figure is 
probably the same for other languages as well. 



a. Nouns . 

Our spoken corpus had some. 1,685 names of persons, 
nicknames, and diminutives of them. About 285 geographical 
names were present - names of countries, states, cities, 
streets and so forth. Proper names, that is, commercial 
trade names, types, et cetera had 231 occurrences. There 
were at least 44 standard abbreviations (siglas) and, in 
spite of the fact that the conversations v/ere between Brazil 
and the United States, only 146 words in English appeared, 
most of them in wide use in Brazil for some time. Examples 
are: video tape, slides, Batman, drinks, long plays, tapes, 

time (for team) , slacks and the diminutive slaquezinho, 
sueter for sweater, and finally striptease. Of these 
12,000 different words, a total of about 5,400, or forty- 
five percent, were used only,. once. It should be noted that 

all verbal forms were counted and included in the total. 
Shortened colloquial forms, even though they do not exist as 
"accepted" words, were included as were interjections. Of 
the 12,000 word total only 3,851, or thirty-two percent, 
were used five times or more. 

Brazilian proper names are included, and in this study 
we find a predominance of double first names. There we re 33 
different combinations with Maria , such as Maria Cecilia and 
Maria Helena . The double form also occurred with masculine 
names, but not to the same extent.. There is also the use of 
the definite article when using proper names, except in the 
case of direct address. 

b. Others . 

While it is not surprising that the function word que 
was the most common (de in the literary corpus), we did not 
exbect to find eu as number two, especially since the first 
person of the verb does not require a subject pronoun. As 
yet we have not determined whether more men than women use 
eu in the daily conversations. In asking a large number of 
Brazilians which words they regarded as most common only one 
was successful in guessing que and no one could imagine that 
eu was the second most common in Brazilian speech. Other 
interesting points are that the masculine definite article o 
is more common than the feminine (9,056 compared to 6,978), 
but the feminine plural os occurs more frequently than oe 
(1,387 vs. 979). The colloquial form pra is more than three 
times more frequent than para . The shortened forms of ta 
and tou are considerably more common in speech than esta and 
estou. The word ai appeared some 2,533 times, (as contrasted 
to 13 for ali, 1,138 for la, and 3,900 for aqui . In our 
conversations the word ai was used to indicate the location 
of the second person in the conversation and distance as 



such, whether close ' • or 5,000 miles away, played no part 
in c hosing the advert of place.. Considering the high 
frequency of al, we would do well to stress its importance 
in our teaching. 



One word that is rarely found in written form is OK of 
which we had 477 examples from the conversations but nor a 
single one in the written corpus. In speech esse (459 x 67) 
is much more common than este . The use of the subject pro- 
noun ele and ela as direct object pronouns occurred 81 times. 

The use of prepositions is easily found by consulting 
the KWIC index. While a f cw examples of chegar a Nova Iorque 
were found, there v/ere about ten times as many occurrences of 
chegar em Nova Iorque . From this same source it would be 
fairly easy to obtain hundreds of examples of the uses of por 
and para . 

Probably sensing the need for a less ambiguous 
possessive adjective, many Brazilian used teu and tua 
instead of seu and sua with voce as the subject pronoun. 
Example: Voce com a tua serenidaae , teu equillbrio - 

However, there v/ere 802 seu ; 971 sua , 82 teu , and 135 tua 
occurrences. 

c. Definite articles. 



From our KWIC index, v/e find that our informants used 
the definite article when talking about persons not present. 
Example: Eu posso falar com a Maria Lucia ? In ninety per- 

cent of the occasions the definite article was used. The same 
took place v/ith prepositions to form contractions with the 
articles: Agora mesmo chegamos do casamento da Marieta . 

The use of the definite article together with the 
possessive adjective was also widespread. With 1,497 occur- 
rences of seu and sua , the definite article appeared before 
the possessive adjective 867 times or 57 percent of the total 
usage. In this group 334 examples, or 22.3 percent, combined 
the definite article v/ith the preceeding preposition to form 
a contraction. There v/ere 83 forms of do seu , but only 18 of 
de seu ; 79 of da sua and 34 of de sua ; 22 of ao seu v/ith 10 
of a seu ; no seu 28 and era seu 3; na sua 34 and era sua 6; and 
finally pelo seu 18 and por seu 6; pela sua 21 and por sua 4. 
In summary then, of the 366 possible occasions in which our 
informants could have used a definite article to form a con- 
traction with a preposition, the definite article v/as used 285 
times or about 78 percent. From this we may conclude that in 
the spoken language the Brazilian prefers to use the definite 
article v/ith the possessive adjective. In addition we found 
that seu was used immediately before names of persons some 
286 times as a substitute for senhor in speech. 
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a.' Variants. 



For the expression l asu v/eel 
a ultima semana with thirty-five 



for a semana or oxime and fifty-seven for a sc 
To give the day of the 
dois . 



there was one example of 
or £ se mana p assage . For 
na proximo semana , one 
ana cue von. 



month the common form was : no dia 



There were forty-five examples 'of o telefoncma , but a 
telef onema appeared nine times. Quo stag without pronouncing 
the u occurred sixty-four times (being used to a much 
greater extent by the younger generation) while ckdestao had 
nineteen examples. Fa^ favo r was found only twenty-nine 
'times compared to 165 sentences with por favor . 

For the Brazilian equivalent of tonight , there were 
fifteen occurrences of ho je a noite , three of ho j e de noite , 
but only one for es ta noite . There were nineteen examples 
of the salutation urn boa noit e contrasting v. r ith twenty-four 
of uma boa noite . Roughly ten percent of the occurrences of 
outro were preceded by the indefinite article urn and the 
same held true for outra with uma . Example: E deoois eu 

dou a voce uma outra resposta . 

With several nouns there are the two possible choices 
in forming expressions as in tenho m uitas saudades or estou 
coir, muitas saudades . In our corpus Brazilians prefered the 
latter forms by 100 to one, and in this very expression the 
women would usually say: estou morrendo de s audades . For 
being hot or cold the form v/as: estou sontindo mnlto frio . 

A cursory check to see the degree to which the Brazilian 
uses the subject pronoun eu, showed that . eu v/as used in over 
haif the occasions in which it could have been spoken. 

There v/as a tendency not to repeat i.t if it had been used in 
the first clause of a sentence. Certain sociologists, 
noting that the Brazilian male had a tendency for saying eu 
e a minha mulher, interpreted this as a case of machismo . 
However, in cur corpus the v/omen also put themselves first 
when speaking as in: Eu e Roberto f izemos tudo nr a ver 

se ela . . . and also Eu e Ney morremos de saudades . 

Finally, there were ninety-nine occurrences of nao tern 
problema v/hile the more "elegant" nao ha problema appeared 
some sixty-one times. Of course, there are many more 
features of the spoken language which can be checked and 
counted. It v/ould be useful to have a spoken corpus of 
about a million words because, in some cases, there are not 
enough examples so that accurante findings can be determined 
v/ith precision. 
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e. Description of Verbal Forms . 

It is somewhat difficult to establish just v/hat con- 
sitiutes a verb or verbal form. In any case the following 
guidelines were used for the frequency count. 

1. The infinitive includes all those forms used in the 
conversational future, such as: Eu vou ver . Mo distinction 
i's made between this group and the personal infinitives of 
the first and third persons singular. 

2. The present participles include all forms regardless 
of function. 

3. The past participles are divided into three groups: 
a) those used v/ith estar in any tense, b) those used with 
ser , and c) all other examples including those forming com- 
pound tenses and those functioning adjectively other than 
with estar . 

4. Progressive tenses are those forms used with the 
various tenses of estar and the present participle. 

5. The conversational future , formed with the present 
of ir and the infinitive, is counted separately even though 
the forms of ir and the infinitives are also counted indi- 
vidually. 2-Jo distinction is made between vamos "let's" and 
vamo s "we will." 

6. The classification imperative substitute designates 
those forms of the third person singular of the present 
indicative that are popularly used for commands with voce 

as the subject pronoun. 

7. The emphatic future is the form composed of the 
present indicative of haver and the prepostion d_e and a 
dependent infinitive. Eu hei de saber. 



Totals . 



There were a total of 89,209 different items, but it 
should be stated that compound forms were usually counted 
three times, one for each part and another for the complete 
unit. Deducting these duplicates produced a total of 81,091 
verbal forms. The table on the following pages gives a 
breakdown according to the various verbal forms. 



' VERBAL FORM 



OCCURRENCES 



PERCENT 



ADJUSTED''' 



Infinitive 


15,335 


17.25 


18.97 


Personal Infinitive 


215 


0.24 


0.27 


i Conversational Future 


3,695 


4.14 




Emphatic Future 


' 13 


0.01 




; Present Participle 


4,613 


5.17 


5.69 


Past Participle 

With estar 543 

ser 515 

i others 2,213 

I 


3,371 


3.78 


4.16 


j Present Indicative 

1 


32,062 


35.94 


39.54 


i 

j Preterite 


12,309 


13.80 


15.18 


I 

5 Imperfect 

i 


2,266 


2.54 


2.79 


i 

; Future Indicative 


1,587 


1.78 


1.96 


i 

i Conditional 

i .... 


7 42 


0.83 


0.92 


■ Present Subjunctive 


1, 612 


1.81 


1. 99 


Imperfect Subjunctive 


579 


0.65 


0.71 


i 

j Future Subjunctive . 


1,559 . 


1.75 


1. 92 


| Infinitive Perfect 


180 


0.20 




f Present Perfect Indicative 

(■ 

i Past Perfect Indicative 


604 


0.68 




\ With ter ' 


139 


0.15 




haver 


48 


0.05 




; Simple Plu-perfect 


1 


0.00 


0.00 


Future Perfect Indicative 


9 


0.01 




Conditional Perfect 


12 


0.01 




! Present Perfect Subjunctive 


93 


0.10 




| Past Perfect Subjunctive 


30 


0.03 




i Future Perfect Subjunctive 


7 


0.01 






VERBAL FORM ( c on t . ) 



OCCURRENCES 



PERCENT ADJUSTED * 



Imperative Substitute (colq) 


2,875 


3.22 


3.55 


Command - 3rd person 


1,729 


1.94 


2.13 


Command - tu (2nd person) 


134 


0.15 


0.17 


Infinitive Progressive 


50 


0.06 




Present Progressive 


2,779 


3.12 




Preterite Progressive 


16 


0.02 




Imperfect Progressive 


165 


0.18 




Future Progressive 


36 


0.04 




Conditional Progressive 


3 


0.00 




Present Subjunctive Progressive 


39 


0.04 




Imperfect Subjunctive Progressive 
Colloquial forms - • 


6 


0.01 




esteje, estejem 


. 8 


0.01 




sej e 


7 


0.01 




vim (substitutes for vir) 


37 


0. 04 





* Based on 81,091 Verbal forms, compound forms v/ere only 
counted once. 
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As can be scon from the distribution of verbal forms , 
the present indicative is by far the most common tense in 
spoken Brazilian Portuguese. Counting ail the present 
indicative verbal forms, including those forming perfect and 
progressive tenses, we find some 32,0 00 examples or a little 
less than thirty-six percent of the total. Subtracting the 



compound forms increases the present indicative tense up to 



nearly 


forty percent. Be low 


are 


some of the 


verbs which 


show a 


preponderant usage in 


the 


present indi 


cative . 


VERB 


OCCURRENCES 


NO. 


IN PRESENT 


PERCENT 


0 s tar 


9,474 




7,370 


77.8 


ir 


7,487 




5,582 


74.8 


ser 


7 , 1S2 




4,806 


66.9 


ter 


4,099 




2,292 




querer 


2,602 




1,509 


58.0 


saber 


2,451 




1,227 


50.1 


coder 


2,010 




1,001 


49.8 


dever 


819 




663 


80.9 



Next in importance came the infinitive with almost 13% 
of the occurrences. The conversational future (3,695 ex.) 
was more than three times as common as the future indicative 
(1,587 ex.). Also, some verbs such as querer were never 
used in the future, nor in the conditional, because, 
according to many Brazilians querried, of the rather harsh 
sounds created in pronouncing these verbal forms. 

There were 12,309 preterite forms, but 1,029 of them 
w ere viu , a common .corruption which is used as an interjec- 
tion seeking confirmation of a previous statement or perhaps 
the equivalent of "Y 1 hear?" Chegar , dizer , entender , escre 
ver , escutar , f azer , falar , mandar , ouvir , nadir, receber , 
responder , seguir , sofrer, and telefonar are verbs which 
have, by far, more of their finite forms in the preterite 
than in any other tense . 

In the spoken language the imperfect is much less f re- 



cuent than in 


the written form. Of 


the 2 , 


266 imperfects 


1,939 of them 


are confined to only 


eight 


verbs : 




querer 


591 


ir 




174 


estar 


•307 


ser 




172 


ter 


300 


haver 




67 


(shortened form of estar) 


poaer 




63 


1 tava 


203 


saber 




62 



O 



15 












T 



T-r 



It is evident that the high frowuenev of an or or in the 



xmpcrrccc sue ms , in part, iron 



ts.use as a conaitionai sub- 



stitute. there were only 26 examples of the preterite quis, 
In contrast, ops tar has alraost no imperfect forms, but it- 
was the most common verb in the condi . '.onal tense. 



For the great majority of verbs there were very few or 
no occurrences in the future indicative. 13 star (16.1) , ir 
(151), ser (137), and coder (104) were the most common . 

Only a small number of verbs had conditional forms: 



The occurrences of the future subjunctive were confined 
to a rather limited number of verbs. Of the 262 examples of 
quiser , some 125 came from the expression: Deus cruiser . 

Most of the remainder consisted of some form of: Se cruiser. 



O 

ERiC 



16 



gos fcaria 


2 31 


iria 


25 ‘ | 


poderia 


73 


f icaria 


17 


teria 


33 


deveria 


14' 


pediria 


28 


estar ia 


12 






viria 


11 


In contrast 


to English and 


to Spanish, 14 


spoken Brasilian 


Portuguese makes 


relatively lit 


tie use of the 


present perfect 


tense. The occurrences' were confined to a few verbs, the 


most common being 


1 » 






escrever 


52 


f azer 


30 


receber 


67 


chegar 


23 


ser 


61 


ir 


17 


ter 


48 


mar.dar 


15 


estar 


30 


sair 


11 


The remaining perfect (compound) tenses 


are even less 


frequent in the spoken language 


. To form the 


past perfect 


indicative the auxiliary ter was used 139 times compared 


to 48 uses of a f 


orm of haver. 


However , our 


literary corpus 


showed an almost 


equal usage of 


the forms of 


ter and haver. 


Only one example 


was found of the simple (or 


literary) plu- 


perfect tense. 






All the subjunctive tenses 


together came 


to about five \ 


percent of the total number of 


occurrences. 


Some of the ] 


more common verbs 


used in the present subjunctive were: 


ter 


185 


dar 


47 


ser- 


113 


chegar 


4 6 


es tar 


104 


vir 


44 


poder 


101 


dizer 


33 


ir 


77 


escrever 


33 


















Sc, guando , a eg in quo , ]_ 0 C' 0 \ n lq (v/hich also introduces the 
present subjunctive), o quo (for whatsoever) , o rna is quo , 
gnalqucr coisa que , o prlmcir o cum , and so/npro cjucj wore used 
to introduce the future subjunctive. Host of the future 



subjunctives v/ere 


found in 


the following verbs: 




i querer 


262 ' 


ir 


55 


j COS. 02T 


190 


precisar 


51 


j ter 


138 


vir 


49 


I ser 


129 


haver 


45 


i chegar 


110 


conseguir 


31 


estar 


101 


achar 


29 


The colloquia 


1 command 


form, v/hich we call 


i _ 

ufie 



imperative substitute, is actually the third person singular 
of the present indicative. It is half again as common (2,875 
vs. 1,729) as the standard third person command for voce , 
o sanhor , and. a senhora . It should be noted, however , that 
1,027 of these occurrences v/ere for the attention getter 



some interesting things, 
but not one of them is a 



There are 66 examples of buscar , 
finite verb - much in contrast to 



the near complete set of verbal forms for procurar . A 
tvpical example would be: Eles vao te buscar no hotel. 



olha . The most 


common imperative 


substitutes 


v/ere : 


j 


olhar 


1027 


esperar 


75 ('per a 


44) 


diner 


2 08 


fazer 


58 




car 


202 


deixar 


48 




falar 


196 


ficar 


45 - 




ver 


105 


telefonar 


39 




ir.andar 


99 


responder 


38 




escutar 


90 


peair 


38 


i 

I 


A look at 


some verb forms in 


the KUIC index brings 


out 



o' 

ERIC 

MflWtHSEiaall 

e 



Then there is the question of tenho de f alar or tenho 
que falar . Our findings v/erc very conclusive - tenho que 
was used almost 99 percent of the time in the spoken corpus. 
A check of our literary corpus, which is of similar size, 
produced an equal number of examples for both expressions. 



;e j e , and 
and 

Dese jo 



Unusual irregular forms appeared: seje , esi 

estejem as well as uheir shortened variants of te j e 
te j ern . This is unconcious overcorrection. Example: 
que tudo teje correndo bem . There we re also three examples 
of escreviao ana 18 of deixa eu. Subjects and verbs failed 
to agree in number in many sentences. Then there v/ere 37 
cases in v/hich vim was used to replace vir as the infinitive 
in examples such as: Voce tem cue vim em desembro de crual - 
quer maneira. 
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oo 



1 

t 



Finally, a word regarding the person of the verb. From 
the KWIC index wo find that the third person singular is the 
most common, closely followed . b*/ the- first person singular. 
The plural forms, however, have' only about one-tenth uhe 
frequency of the singular forms. 

In our spoken corpus voce was the common form of 
address (6,771 occurrences ana also used 132 times as a 
direct object pronoun) in contrast to o seni or (419 times) 
and a ser.hora (742 times) . Son and daughters usually used 
voce when addressing a parent, but would use a s enhora when 
speaking to their mother-in-law. Most of the individuals 
who used o senior and a senhora when addressing their own 
parents individually, used v oc a s when speaking or referring 
to both of them. In the plural this contrast was even 
greater - we had 1,479 occurrences of voces but only twenty- 
one of os. senhores . Also, there were 3.53 examples of tu , 
but not a single yos. Once again, those who used tu in the 
singular also used voces in the plural. 

On the following pages are the lists by descending 
order of frequency and also by alphabetical order for all 
verbs having forty or more occurrences. On the alphabetical 
list there is a notation as to the most common verb form 
found for each different verb. Complete lists of verb-form 
frequencies of all 650 verbs are available to scholars on 
request. 
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9 

ERIC 
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f 



.24 



u; 



one lusions one Roconnncln t i or.s . 



The f in d i n g n p ‘o o • *, c i 



M- r: 



the highl-Lyl 

" s'* r - nd t.h o i x co u. o j i c _ 

be clone. :<ow that 



For the present the KlilC indexes will 



hope a 

x 

L' w « 

Lir.gu 
produce 
kvvic lis 



r.tcd in thin report; arc only cor.ve of 
from the KV.eC ind ox of the spoken 
v lions. A syntactical analysis 
on corpus and also a 
o available in concordance form, it is 

«e of" this opportunity, 
be available at the 
Naval Academy and also at the School of Languages and 
sties of Georgetown University. Wo are attempting to 
e microfiche copies as v/oii as computer tapes of the 



ta.-.c 

corpus 
remains to 
literary corpus 
i-vxa..^^ than others will 



During the period of the con trace several events took 
place. Methods for inputing manuscripts to the computer 
were developed, time-sharing systems became available, and 
new hardware could effectively deal with the large volume 
of data to be processed. In other words, were the project .... 
scarred over today, there would be many changes in methods 
and many of the difficulties we experienced would no longer 
be present. 

Based on our experience optical scanning should be 
rejected as a means for inputing data to the computer. The 
error rate is too high and the correction procedure is too 
cumbersome. A key punch operation is also not the most 
efficient method. Instead, one of the best ways to input 
data is by. a time-sharing system from a remote terminal. The 
IBM 2741 teletypewriter terminal is one of the most 
versatile since it has the capability of having the inter- 
changeable sphere, a feature which will v/ork for various 
different languages. The 2741 then works on a disk pack 
and corrections can be made from the terminal. Also, a com- 
puter verification program can be used to catch some of the 
more obvious errors. Bun perhaps the best method is to have 
the manuscripts typed twice, once each by two different 
persons. A comparison programs reads both files, printing 
all discrepancies. The human proofreader then has no more 
to do than to select the correct form. Of course, the 
correct material is not touched and additions or deletions 
can be made without disturbing the original data. Once the 
data bank is on the disk pack a very efficient edit soft- 
ware package provides for making text replacements, changing 
thousands of examples of a unique string by one single com- 
mand. Such a feature is most important for during the 
period of our contract the Brazilian government changed the 
official orthography of Portuguese. Once we get the 
•magnetic tapes on our system, we should be able to make the 
changes without too much trouble. 
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I21 preparing the manuscript for compu tor input there 
should bo the least amount of coding possible , Coding for 
syntactical items only loads producing more errors and 
additional programming problems. Since the items fall cut 
together in the Kto’IC index and also in the reverse alpha- 
betical concordance/ very little is gained. The possible 
exception is the case of homonyms as, for example, the 
preterite forms of ser and in: and the preposition a. Here 
a non-print character can be affixed to the word in question 
and, while it will not appear, all occurrences will be 
sorted separately. 



Although using upper and lower case adds considerably 
to computer running time, this feature will provide for a 
distinction between proper and common nouns. At the same 
tine there is an exact reproduction of the original text. 

The use of accent marks adds to the problem since a special 
program must place of words with an accent mark immediately 
after a similar one without the accent mark as in dictionary 
form. Without a special compensating computer program words 
with accent marks will be alphabetized out of order and will 
be the first words after the beginning of each letter. 
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Vhhor 


■ruq 








jt_ 


estnr 


9474 


51 


corrcr 


n 


4 v* 


7437 


52 


continual: 


3 


sor 


7132 


r. •> 


cnaraar 


*A 


tor 


40 0 9 


54 


providcnciar 


5 


f alar 


3299 


5 5 


antra r 


C 


i ..ar.ua r 


2782 


56 


lie? nr 


7 


queror 


2 GO 2 


57 


U— Cl 


vj 


£>r.i>o r 


24 51 


58 


encontrar 


9 


cizor 


2166 


59 


ap rove i tar 


10 


pa do r 


2010 


60 


sentir 


11 


nassar 


19 21 


. 61 


coincic ar 


12 


car 


1817 


62 


ton tar 


13 


fator 


1802 


63 


parcccr 


14 


f icar 


1772 


64 


trabalhar 


15 


receber 


1706 


65 


pens a r 


16 


checjar 


139 2 


66 


via jar 


17 


es crave r 


1374 


67 


exp lie ar 


18 


cuvir 


1260 


63 


con vers a r 


19 


en tender 


1176 


69 


esquecer 


20 


vir 


1169 


70 


despedir 


-21 


clhar 


1144 


71 


pa gar 


2 2 


esperar 


1067 


72 


entregar 


23 


ver 


1039 


73 


andar 


2 4 


viu ? 


10 29 


74 


con fi mar 


25 


clever 


819 


75 


repetir 


26 


haver 


746 


76 


comunicar 


27 


peair 


737 


77 


tratar 


23 


sair 


• 684 


78 


acontecor 


29 


a char 


671 


79 


tirar 


30 


precisar 


617 


80 


Conor cender 


31 


telefonar 


533 


31 


acabar 


32 


voitar 


491 


82 


estudar 


23 


preocupar 


473 


S3 


envia.r 


3 4 


ievar 


469 


84 


botar 


35 


gostar 


4 4-5 


85 


dene j ar 


36 


conseguir 


413 


86 


ruudar 


37 


avis a r 


411 


87 


defender 


38 


resolver 


40 7 


e o 


pegar 


33 ' 


deixar 


371 


89 


ar ran jar 


40 


e sou tar 


356 


9 0 


in format 


i 


aguardar 


'• ~ 0 
w +J 


91 


embarcar 


n 


responder 


,331 


92 


faltcrr 


43 


conprar 


32 5 


S3 


props rar 


4 4 


p recur ar 


320 


9 4 


terninar 


4 5 


toner 


299 


25 


r.arcar 


46 


agrudecer 


246 


9 6 


combiner 


47 


nor re r 


233 


27 


transnitir 


4 8 


seguir 


229 


93 


donor ar 


49 


pergur.tar 


217 


99 


lerab rar 


50 


cottar 


215 


100 


conn seer 
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20 7 
20 3 
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v:.;r:-': 


: r ’1 : ( con t i n u od ) 




101 


do:; ligar 


99 


10 2 


r. v; j Id ora:: 


Co 


10 3 


arruivtar 


85 


10/s 


o* or 


85 


JLw ‘o 


ajudar 


8 4 


10 6 


ore lender 


83 


107 


nercler 


O -1 


10 S 


adorer 


00 


109 


pas sear 


76 


110 


acred i tar 


/ *i 


-.11 


bus car 


73 


112 


r erne ter 


72 


113 


chorar 


71 


12. .•: 


jail tar 


71 


115 


norar 


67 


115 


a tender 


62 


117 


apanhar 


61 


113 


ccupar 


61 


119 


colocar 


60 


120 


d os cans cir 


60 


121 


ditar 


59 


122 


acliantar 


56 


12 3 


crer 


52 


124 


do rial r 


51 


125 


coner 


50 


125 


cuicla-r 


50 


127 


parar 


50 


128 


recuperar 


50 


129 


ler 


j 


130 


custar 


48 


131 


intcrossar 


48 


132 


ncistar 


44 


133 


abuse r 


4 3 


~ 'j < 
— T 


alegar 


43 


3.25 


acertar 


42 


.12 6 


aprender 


41 


137 


despachar 


40 
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1 , ; o - v- 


• • ^ 


inf 17 


ci C C - U U iT 


*; ') 6; 


pret 63 


ci C C* "*** U C; U 


4 *") 


pres ind 1 3 


ach,\r 


0 71 


pros ind 495 (acno 323) 


acontocor 


13 u 


pro 6 o 


acroditr.r 


7*1 


pros ind 69 (all aro aerndito) 


- /« 2 - v- J. .. 

ti'xxi.aii lU*. 


5.6 


pres ma j. / ( am aro aciian'ca) 


ado rar 


60 


prot 34 


agradoccr 


-i u 


pres and 96 


aguardar 


252 


pros part 109 


c « JlCc*i. 


8 4 


r r. I . 3 b 


a -L u g a a 


43 


inf 12, prot 11 


andar 


1-16 


pres ind 42 


anar.hu:: 


01 


inf 38 


uprondor 


41 


inf 12 


aprcvaitar 


19 2 


inf 49 


ar ran jar 


108 


inf 3 2 


^4. — ir ..ax 


35 


.■ .r o o 
jl. a 


a tender 


62 


in 3 jl 


a via ar 


411 


inf 159, imp sub 59 


bouar 


118 


inf 43, prot 36 


bus car 


73 


inf 57, used only in inf & cor.v 


ci i Cxi tc«r 


207 


inf 52, pros ind 38 


chogfur 


135 2 


proe 354, inf 239, pres ind 136 


chcrar 


71 


pres part 20 , inf 17 


cclccur 


60 


inf 24, prot 11 


co.Tibinar 


96 


past part 20 


ccr.iocar 


J.OO 


prot 47 


comer 


. 50 


pres part 17 


comp rar 


325 


inf 132, prot 80 


compreencer 


13 4 


prot 63, pres ind 34 


ccmur.icar 


139 


inf 64 


conf imar 


141 


inf 42 


c cnne.ce r 


90 


pres ind 35 


consecuir 


415 


prot 152 


con tar 


215 


pres part 80 


continuar 


210 


pres ind 75. 


co n vers a r 


17 3 


inf 65 


correr 


211 


pres prog 76 


crer 


C O 
u r. 


creio only form 


cuidar 


50 


cor. man ds 16 


c us tar 


48 


pres ind 1.4 


dar 


j.817 


inf 552, pret 202, imp subj 202 


dei * av 


371 


inf 129, pret 72 


demo rar 


9 5 


inf 34 


dependcr 


110 


pros ind 42 ( depends 36) 


descansar 


60 


past part 43 


dose jar 


115 


pros ..ad 70 


dosli' /ar 


90 


inf 4 2 


do spa char 


40 


inf 12 
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if 
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i* 
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s 
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;V 




.1 
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VVmiPORM -CO: 


- -■* <J ■■*! * ^ ■■ , r * \ 
i e- J- * 1 twd wv ^ 




J 


Les;>juir 


1 1 7 


iuf 74 (conv fur 43) 


4 

. i 


covor 


iil/J 


pros iea 663 (cT eve 44 7) 


A 

.> 


a r ^ a r 


':yj 


part; pert 25 


hi 


c*. v. c r 


2 ICG 


■C - /• •* T", >1 ■» /• A'l O f, n fl 1 .* f f-i •, 

.x Ou.y } j_ • U ‘i ‘J ^ j COjii 


ga 29 4) 


co 1 *;.:! :: 


5 1 


I'll J.V 


a 


*“l 1 ^ ~ • - * | 


*j c / 


a c ^ 3 *' t, *.j 2 *i r «1 3 0 


i 


' 






i 


c ri c or: T- r t.v l 


c;J h 


ear 3.- . pret m 


4 


O*.* L..O- *-W ^ 


117 6 


pret 666 ( enter. di 415, enter.de 
*» ' a a a o a r *c 1 0 j. 


u 236) | 


or.tr.ir 


199 


inf 77 


1 


cntrcrar 


151 


in;: 50 , pret 37 


1 


envicr 


123 


4 ^ ^ O ;C 

X - . J. /• f ! 


1 


esc rover 


1374 


pret 40 5, ia ’ 30 4 


"f 


ci' rater 


*"» r- ,* 

j J U 


pret 101, esc tire iropor. sub 50 


■§ 


O' ^ C 


10 6 7 


pres me. .-. : 5 , past pert 2-,0 ( ' 


para 44) | 


csaaccer 


172 


ccnvnauds 6 1 9 pro t 33 


| 


o o *r. a . v 


94 74 


pres ir.d 7370 (te 319 4, tou 06 


0 ^ 1 


es ruder 


125 


pres pert 44 


1 




179 


pros nart 45 


“I 


fa lar 


3259 


inf 1350 




fa I tar 


103 


pres ind 44 (only 3rd persons) 




rare v 


1002 


inf 511 , pret 233 


I 


i. d. O /.» x 


1772 


inf 565 


1 


CJ’c.S a c; r 


*. 


inf 15 


«5 


cto g car 


445 


ccnd 235 (gostaria 231} , pret 


104 1 


haver 


7 4 5 


only 3rd sing., he. 435, houvo 
heaver 45 


9 5 1 


informer 


10 o 


inf 30 


1 


iaforeesar 


/ .*> 
*. o 


past part ID 




i r 


7437 


pres 5502 


■1 


3 ar.*c<u^. 


71 


inf 33 


1 


lerbrar 


95 


ores ind 26 


I 


lor 


49 


pret 18 : 


i 


lever 


469 


inf 163, pret 77 


j 


ligar 


193 


inf 163, pret 77 


i 


mani ar 


2732 


nr at 724, inf 628 


v| 


narcar 


99' 


in.: 22 




r.elhorar 


36 


pret 29 


1 


norar 


67 


pres ind 25 


J 


rao rrer 


233 


pros part 100 


J 


rr.udar 


112 


pret. 40, inf 36 


'1 


o cupar 


61 


inf 14 


;S 


clear 


1144 


olha (attention getter) 1027, 


olho 87 J 


ouvir 


1260 


pret 633 


| 


pager 


153 


inf 70 


J 


parar 


. 50 


mf 13 


1 


pa re car 


let 


pres ind 16 3 (nare.ee 161) 


"r 


passer 


1921 


ores nrog 305, inf 259 


i 

i 


pass C: a r 


7 6 


inf 22 


ii 


pc air 


737 


pret 215, pres ind 143 


j 

■Jj 

Sj 

1 
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' J' V_ ■ , * . *. 


1 1 :\ 


.'-i. :. Jj a 




~ a a 


pres pert 46 


; ] >•,.: 1T*I*. C j* 


t* a 


* *) V C at 3 


, » .-. - -« .... . .- .;. „„ 




• « p A 


■ » V.« w. ^ ‘ ‘— • • t», C4 *. 


a - / 


lit at 


*" 0 C.O !f 


2010 


pros ind 1001 (podo 683; 


;*or 


c a 


prot r r. 


erode a r 

i 


617 


pres ir.d 346 (precis a 210,. precise 106) 


areccuoor 


4 7 3 


c errands 19 6, past pert 19 4 


preparer 


101 


p 4. e 3 pa rt 3 a 


pretender 


a a 


or os • nd S*- 


, V; _ ucir* a r 


3 JO 


inf 113 


i • providenciar 


203 


inf 52 


j cue re:: 


260 2 


pres ind 1509 , ir.per 591, fut sub 262 


j rcceber 


1706 


pret 1093 (recebi 543, reccbeu 256) 


t rectos rar 

I 


it A 


pros part 17 


; r c r *e a a r 


72 


pret 19 , inf 19. 


j repot ir 


140 


inf 52 


revolver 


407 


inf 127 


responder 


331 


inf £1, pret 63 


s a o a r 


-~v * r ? 
- *» -JJ. 


pros me 1227 (soi 697) , inf 1005 


a a a r 


6 0 4 


inf 201 


; coguir 


229 


pret 67 


soncir 


ley 


pres ind 55 


a a a* 


7132 


pres ir.d 4806 (b 4218, nd 202) 


telercner 


533 


mf i’-S, pret 124 


1 uua uc«r 


187 


inf 43 (conv fut 39) , pret 37 


r :er 


4099 


pros ir.d 2292 (ten (sing.) 1240) 


ac ra^Hcir 


100 


inf 3-; 




T O •" 


• - - 


t- _ j. C» t. 


J. wi U 


i ; . j. a j. 


wO. 1 ct «. 


259 


inf 109 


trace Ihar 


124 


inf -22, pres prog 39 


aranar.iair 


95 


- •-* ST v O 


tracer 


139 


ir.f 31, pres ir.d 27 


tracer 


193 


inf 91, pret 36 


ver 


1039 


inf 527. (vou ver 86, var.os ver 53) 


via jar 


180 


i- - 7 1 


vir 


1169 


pres ind 461, vi:* for vir 37 


viu ? 


10 29 


interjection (form of ver) 


vo 1 car 


491 


inf 17) , pres 74 
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DISTRIBUTION CP INFORMANTS BY PROFESSION 



Male 



Total 



u 

C 

D 

p 

G 



0 



■D 

-v 

C 



IJ 

-J 

w. 



Aviation 

Banning 

Businessman 

Diplomat 

Arty 

Clerk (office) 

Nigh government: official 
Housewife* 

Engineer (professional) 

Journalist (newspaper) 

Child 

Lawyer 

Physician 

Navy 

Nr iter 

School teacher 
Security police 
Radio a TV personnel 
Student 
Clergy 

University professor 
Parting 

Native of Portugal 
Not identified 
Business executive 
Non-natives 



Total 



10 

15 

10 

21 

9 

*■? 

/ 

0 

14 

2 

6 

6 

19 

36 

0 



12 

43 

12 

3 

1 

97 

9 

3 



3S7 



tv 

0 

9 

0 

n 

0 

1 

-L 

13 

0 

t 

72 

0 

0 

T 



0* 

0 

5 



470 



23 
12 
16 
12 
21 
3 4 
7 

335 

14 
2 

15 

6 

20 



115 

1 

13 

2 

2 

97 

o 



837 



* Many of the '....•ten who could not be identified as to professior 
were placed in the housewife' category. 
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A uthor and Sol cc ;;_ lor* :•; List 



( OCs 



Sol c:C I; Inn 



1. . 



.1. 0-2 J. 
:Ai~(>2 
9 7-1 OS 



Acin. i ion Jx.i.ho, }3t.:i.i.u, !.0.!S: 

Core-.; vivo. K- n do Janeiro, 
Ka. Civil i:<ar.:ZIo iorooilei rn / 

-<*«*■ r» i-./- 

j- !>> u *i , -jo o;> . 



V.. 4 ." A 

r> -v 



7-20 

59-32 

.25-143 



.. -i 



ac 



nnaracio, Garros Drutuao: 

Min as, 19 02: Cont e.; do anrondi z , 
3a Ed. Rio do Jar.orrc, Editejra 
do Autor , 15C3. 207 pp. (First 
Edition, Rio, Jose Oivrrtsao, 

19 21) 



P.3 

RB 



37-67 Braga, Ruben, Esplrito Santo, 

9 9-123 1913: Ai do tl , Copa ca bana . 4& 

191-211 Ed. Rio do Janeiro, Eciitor'a do 

Autor, I960. 222 pp. (la Ed. 
1960) . 



J c 

— /"I 



1-23 

K T_7 

114-13i 



Jose Condo , Pernair.buco, 1916: 

Uni rang o ar a L_r!_.r . 3 a Ed. Rio, 
Ed'itora Civiiisaoao Brasiloira, 
.1961. 145 pp. (la Ed.: Rio de 
Janeiro, Civilrracao Brasileira, 
1959). 



r 

!v 

e 



cc 

cc 



17-50 

86-99 

127-139 



Cony, Carlos Heitor, Rio, 1926: 
Antes o verao . Rio do Janeiro, 
Editora Civiiisacao Brasiloira, 
1954. 171 on. 



7. 



AD 

7 . 

t\u 



GF 

-p 

0 — **i 



1-13 

3 6--. J 
84-96 



3-12 

107-119 

261-272 



Dourado, Walc.emiro Autran , Min; 
19 2G: Ufp.a vicla evn so cr edo . Rio 
Janeiro, Eaitora Civilisacao 
Brasileira, 1964. 103 pp. 



•- — <■' / 



Octavio 



/ -\—0 f 



(0 an jo ue pedra 



11 ) 



1903: 
Rondo 
Rio do 



Janeiro, Jcso Oiyr.'.p ..a , 1963. 
pp. (Tragedia. Burgee sa, IM) 



390 
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'TYrrr 1 **— «“r»rr 



CO'i: 


i nc^o:’* 


\ "A . 1 . i O . i 


O , 

u * o : 


1 G 


Picuoirodo .. Gvilhcr.’iVo , Sai Rauio , 


■ * T’ ! 
VO i 


. ..-42 


1918: 0 cuv.rc iado do :lo. J.o , 


G? 


58-69 


CiVi . .1 11 :i OXra, 


G? 


11- LOO 


1261. 257 pp . 


op’ 


136-14 5 




9 . 


3 '7 ^ 0 ■■' 

U. / v~> 


l‘c n s e c o , Hr.i Bui hoes Carvalho da: 


V 1 'l, 1 


0-52 


Sene siiencios, Rio de Janeiro, 




103-115 


Livruria Prciras Ban too , 1961. 


I! 7 


125- 57 


229 pp . 




101-204 




1 r, ^ v 


(23 cror.icus) 


ilanrique , Luxs , Bahi&~B xrrar.ibuco , 






1926: Manx no xio-an-io oassarinho. 






Rxo , d'c-iipo ~rasi — ai.ro xDo 4 -. .139 pp. 




1-16 


I vo, Lade, Aia-coas, 1914: 0 


-» 


29—13 


oobrir.ho do c^onoral. Rio c;<$ 




70-91 


Janeiro, La. Civiiizacao Brasi 
ieira, 1964. 124 pp. 


12. AL 


31-45 


Laiue, Asce.idino , Paraiba, 1315: 


di— j 


82-103 


A r.rxsao . R ‘ o de Janeiro, Ed. 0 


7“ ~" 

1-J 


130-145 


Cruzeiro, l.>uC* 212 up* 


13. OL 


1-14 


Lins r Cs;v:uu . Pernarribucc-Sao Paulo, 


ks ; 


94-109 


1924: :: r ; -Inheiro de prine-ra 


OL 


151— 18 5 


viaper . Rio da Janeiro, Eaitora 






Civiiizacao Brasileira, 1963. 






165 pp. 


‘ /. , V 7^ 


39-50 


. open, Moacir., Coc.ro .1017: 


. .L 


97-103 


.ria do c; .da p. .no. 2a Ld. Rio 




229-239 


de Janeiro, Editors Civi lx cacao 
Erasi-eira, 1962. 289 pp. (la Ed. 
Rio, 1959) 


15. ~M 


11-21 


Martins, Joac, Bahia, 1916: 


xj M 


102-114 


Os indesmados . Rio de 


u 1 /± 


113-137 


Ldicoes 6 Cruzeiro, 1964, 242 pp. 


-• . ^ 
o * i 


i-. 1-159 




JM 


176-188 
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Code 



Pages 



Selection 



MO 


15-28 


Montello, Josue, Maranhao, 


1917: 


MO 


72-85 


0 labirinto de espelhos. 2a 


Ed. 


MO 


126-137 


Sao Paulo, Livraria Martins 


, 1962. 



161 pp. (la Ed. Rio, Jose Olympio, 



1952) . 



17. SM 


13-25 


Moraes, Santos 


(Jose), Bahia, 


SM 


95-108 


1920: Os filhos 


do asfalto. 


SM 


174-186 


Rio de Janeiro, 


Ed. Jose 



Alvaro, 1964. 



18. EN 


21-31 


Nascimento, Esdras do, Piaui, 


EN 


56-67 


1934: Solidao em familia. Rio 


EN 


99-110 


de Janeiro, Editora Civilizaijao 


EN 


132-145 


Brasileira, 1963. 233 pp. 


19. SP 


3-17 


Paezzo, Sylvan: fipoca dos tristes 


SP 


25-38 


Rio^de Janeiro, EditSra Civili 


SP 


54-66 


zagao Brasileira, 1964. 121 pp. 


SP 


79-92 


SP 


99-111 




20. PP 


37-60 ; 


Porto, Sergio (Preta, Stanislaw 


PP 


76-97 ' 


Ponte), Sao Paulo: Primo Alta- 


PP 


150-171 


mirando e el as. 2a. Ed. Rio de 
Janeiro, Edit5ra do Autor, 1962 



(1st. Ed. 1962). 206 pp. 



21. RR 
RR 



19-31 Ramos, Ricardo^ Alagoas, 1929: 

117- 128 — Os desertos . Sao Paulo, Edisoes 
Melhoramentos, 1961. 168 pp. 



Resende, Otto Lara, Minas, 1922: 
0 brago direito . Rio de Janeiro, 
Editora do Autor, 1963. 233 pp. 



22. OR 
OR 
OR 



5-17 

78-90 

185-197 



Code 



Pages 



Selection 



23. NR 


19-24 


NR 


25-30 


NR 


37-42 


NR 


49-54 


NR 


67-72 


NR 


73-78 


NR 


137-142 


NR 


173-178 


NR 


185-190 


NR 


197-202 



24. FS 


9-22 


FS 


49-64 


FS 


93-106 


FS 


151-164 



25. DT 


5-20 


DT 


35-50 


DT 


79-92 



26. JV 


9-18 


JV 


39-53 


JV 


73-86 



27. EV 


24-36 


EV 


124-137 


EV 


248-260 



Rodrigues, Nelson, Pernambuco, 
1912: 100 contos escolhidos : 
a vida comb ela e. Vol. I, Rio 
de Janeiro, J. Ozon, 1961. 316 
pp. 



Sabino, Fernando, Minas, 1923: 
O encontro marcado . 5a Ed. Rio 
de Janeiro, Ed. Civilizagao 
Brasileira, 1960. 287 pp. • 



Trevisan, Dalton, Parana, 1926: 
Morte na praga (contos). Rio de 
Janeiro, Editora do Autor, 1964. 
115 pp. 



Vasconcelos, Jose Mauro de, 

Rio Grande do Norte: Doidao , Sao 
Paulo, Exposigao do Livro, n.d. 
(1963) 102 pp. 



Verissimo, firico, R.G. do Sul, 
1905: O arquipelago . Vol. (O 
Tempo e o Vento , 3a Parte) ."“Porto 
Alegre, Editora Globo, 1961. 

304 pp. 
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IDENTIFICATION TAGS AND CODES 



1. Verbs* 



a. The preterite forms of the verb SER will be typed with 
a dart 7 following. The same applies to all forms derived 
from the third person plural of the preterite of SER. 

Foi bom eu ter falado com voce. 

F0I7/B0M/EU/TER/FALAD0/C0M/V0CE=/./ 

A bagagem ainda nao foi despachada. 

A/BAGAGEM/AINDA/NA+0/F0I7/DESPACHADA/./ 

que fosse retirado ate segunda ordem 

QUE/F0=SSE7/RETIRAD0/ ATE' / SEGUNDA/ ORDEM/ 

Se for possivel Dona Odete ... 

SE/F0=R7/P0SSI' VEL/DONA/ODETE/ 

Os resultados dos exames foram otimos. 

OS /RE SULTADOS /DOS /EXAMES/ FOR AM7/0 ' TIMOS/ • / 

b. The following forms of VER will be followed by the 7 
dart symbol: via ; the preterite forms viu y viste y vimos and 
viram and the forms derived from viram. 



file viu o amigo. . E=LE/VIU7/0/AMIG0/ . / 

Vimos ela no jardim. . VIM0S7/EL A7/N0/ JARDIM/./ 

Se voce vir Maria ... SE/V0CE=/VIR7/MARIA/ . / 

c. A verb form with an attached pronoun will be typed 
without a hyphen. 



cuide-se CUIDE/SE7 

, alistar-se ALISTAR/SE7 

Exception : Shortened infinitives. 

leva-lo LEVA ' -/LO/ 

ouvi-lo 0UVI-/L0/ 



mudou-se 

chama-se 



recebe-lo 

ajuda-la 



MUD0U/SE7/ 

CHAMA/SE7/ 



RECEBE=-/L0/ 
A JUDA ’ -/LA/ - 



2. Articles and Pronouns. 



a. Articles used as pronouns and their contractions will 
be identified by a 7 directly after the pronoun , ex- 
cept when the article is followed by que . 



o daqui 
o do rio 
a de portugues 
queres que eu o espere 
no de abril 



07/DAQUI/ 

07/D0/RI0/ 
A7/DE/P0RTUGUE=S/ 
QUERES/QUE/EU/07/ ESPERE/ 
N07/DE/ABRIL/ 



b. The reflexive pronouns ME, TE, SE, and NOS will be 
followed by the 7 dart symbol. 



eu nunca me separei dele 
eu me lembro 
tenho me interessado 
es tamos nos combinando 

pra~ se - despedir - 

nao se preocupe 



EU/NUNCA/ME V /SEPAREI /DE=LE/ 
EU/ME7/LEMBR0/ 

TENH0/ME7/I NTERESSADO/ 
ESTAM0S/N0S7/ COMBI NANDO/ 
PRA/SE7/DESPEDIR/ 

NA+0/SE7 /PREOCUPE/ 



c. Colloquial direct object pronouns fiLE, ELA, fiLES, 
ELAS, vocfi, vocfis are identified with a 7 



eu seguro eie mais essa 
eu despacho ele para , . . 
conhecem voce 
eu ouvi voce bem 



following. 

EU/SEGUR0/E=LE7/MAIS/ESSA/ 

EU/DESPACH0/E=LE7/PARA/ 

C0NHECEM/V0CE=7/ 

EU/OUVI /V0CE=7/BEM/ 



3. Contractions. 

a. Except as noted below, contractions are typed as nor- 
mally written without special identification. 

nisso . NISSO 

desta DESTA 

b. The contraction NOS, to be differentiated from the ob- 
ject pronoun (no special identification) and the 
reflexive pronoun (followed by 7), is printed with an 
& ampersand following. 

nos cursos N0S&/CURS0S/ 

nos meninos / NOS&/MENINOS/ 

c. For the contraction A and As, see section on accent 

marks. ) 



4. Prepositions and Conjunctions. 

a. The prepostion A will be distinguished from the 
articles or the pronoun by the a symbol. 

COMEC, AR/A&/ANDAR/ 

ESTA* /A&/CAMINH0/ 
A/VIAGEM/A&/N0VA9I0RQUE/ 
EU/ESCREVI/A&/RUBENS/ 
A&/RESPEITO/ 

b. The interrogative por que is written with one inter- 
space, with the 7 symbol linking each of the ele- , 
ments. 



comegar a andar 
esta a caminho 
a viagem a Nova Iorque 
Eu escrevi a Rubens 
a respeito 



por que 



P0R7QUE 



\ 



I 

i 

I 



i 

I 



I 



5. Inter j ections . 

a. Interjections that are used for a pause will be 

followed by the 7 symbol. All others will be typed 
without special identification. The following are 
some that occurs 

joy, admiration 
pain 
aversion 
plea 
pause 
call 
answer 
O.K. 

6. Multi-word expressions. 

a. When an expression, such as a title, proper name or 

— multi-unit_number~is - composed of more than one word, 

the elements are linked with the 7 symbol. 

RI07DEV JANEIRO 
MARIA7TERESA 
P0=RT07ALEGRE 
TREZENT0S7EVCINC0 
ET7CETERA 

b*. Exceptions are made for lengthy expressions (over 14 
characters) and for those which are not bona fide 
titles. 



Abbreviations. ; 

a. Standardized abbreviations 


are typed as pronounced 


CAPES/ 


COMSAT/ 


EMFA/ 


A/FAB/ 


INTELSAT/ 


A/OEA/ 


A/ONU/ 


A/VARIG/ 


Accent marks. 





Rio de Janeiro > 
Maria Teresa 
Porto Alegre 
trezentos e cinco 
et cetera 



AH/EH/OH/ 

AI/UI/ 

IH/CHI/ 

0 / 

AH7/EHV/IHV/0H7/UHV/ 

0 =/ 

01 / 

OK/ 




a. The following symbols, which follow the letters, are 
used for accent and diacritical marks in Portuguese: 



# 

+ 

» 



e 

as 

ele 

nao 

cangao 



E' 

A#S 

E=LE 

NA+0 

CANC.A+0 
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9. Punctuation. 

a. The system of punctuation adopted in transcribing the 
conversations attempts to reflect more closely the 
actual speech patterns rather than conform to the 
standard rules of sentence punctuation. 

b. A hyphenated word will remain hyphenated, except as 
shown in paragraph lc. 



sexta-feira 

capitao-de-corveta 



SEXTA-FEIRA 

CAPITA+O-DE-CORVETA 



c. The shortened form of weekdays will retain the hyphen, 
a sexta A/SEXTA-/ 

10. False starts, incomplete words, pauses, etc. 

a. A single word repeated once or several times as the 
speaker ponders what to say next will be linked with 
the repetition, in the same interspace, by a dart 7 . 

eu eu encerro EU9EU/ENCERR0/ 

A single word repeated for emphasis will not be linked. 

nao senti nada nada nada NA+O/SENTI/NADA/NADA/NADA/ 

b. A false start involving part of one word immediately 
substituted by another will be typed as follows: 



preci 



/(PRECI-/ 



c. A false start involving a complete word immediately 
substituted by another will be typed as follows: 



eu voce deve 



/<EU/V0CE = /DEVE/ 



d. A false start that results in an incomplete thought or 
sentence will be typed without special identification. 



eu vou eu quero saber 



/EU/VOU/EU/QUERO/SABER/ 



e. A substantial pause, for sny reason, will be shown by 
three suspension dots in the interspace. The symbol 7 
will precede. 



f. Laughter is indicated: 



HA7HA9HA/ 
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11. Colloquial forms. 

a. A partial word reflecting colloquial speech habits 
rather than a false start is typed as follows: 

1 tend! 

'pera 

’ta (for ESTAR) 

1 tao 
* tava 
1 tive 



i 

4 

i 

•»< 



a. The following pronunciation variant will not be normal- 
ized but will be typed phonetically thus: 

NUN (for nao) 



b. However, the following colloquial forms are typed 
verbatim, with symbols appearing only as heretofore 
indicated. 



ce 


CE= 


ces 


CE=S 


ne 


NE ' 


pa 


PA 


po 


PO 


pra 


PRA 


pras__ 


P-RAS 


. , . pro 


PRO 


pros 


PROS 


ta 


TA' 


tas 


TA'S 


tou 


TOU 


te 


TE' 


viste 


VISTE 


viu 


(ouvir) VIU 







12 Word variants . • 



/EN)TENDI/ 
/ES ) PERA/ 
/ES )TA(R/ 
/ES ) TA+0/ 
/ESJTAVA/ 
/ES ) TIVE/ 



b. Both forms of the following are written out: 

/ 





i 

i 

i 


acessivel 




ACESSI'VEL 




1 

i 


accessivel 


1 


ACCESSI ' VEL 




! 

f 


aeroporto 




AEROPORTO 


; 

! 


> 

i 


aereoporto 




AEREOPORTO 




i 


contato 




CONTATO 






contacto 




CONTACTO 






interim 




I'NTERIM 






interim 




INTERIM 






miligrama 




MILIGRAMA 






miligramo 




MILIGRAMO 


i 


questao 




QUESTA+O • 




qtiestao 




QUESTA+O? 






queto 




QUETO 






quieto «. 




QUIETO 
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Certain nicknames which are homographs of other words 
are followed by the 7 symbol. 



DAD07 

VI7 



TE * 7 
VIVI7 



d. The names of the letters A, E, and 0 are followed by 
the i symbol to distinguish them from articles and 
prepositions. 



13. Divergent forms and usages . 

a. Internal errors in pronounciation will be typed as pro- 
nounced with the correct form' preceding and separated 
from the incorrect form with a closed parenthesis, all 
in one interspace. If the error involves omitted 
letters at the end of a word, the letters are added 
preceded by an open parenthesis. 



b. Because of the difficulty of recognition of high fre- 
quency sounds, the omission of the final s will not 
be noted. 

c. For errors in syntax, the symbol $ will be typed be- 
fore the error, followed by the space bar. 

a problema /S/A/PROBLEMA/ 

d* For the sake of consistency the contraction of the pre- 
position a and the feminine definite article will be 
assumed wherever this interpretation is possible. 

a sua disposigao /A#/SUA/DISPOSIC» A+0/ 

a Maria /A#/MARI A/ 

14. Non- identi ty . 

a. Persons and places not to be identified are shown as 
XXX . Two such names occurring together are written: 



b. An unidentifiable word or phrase in context is shown 

_ # 



probrema 
sa (sabe) 



PROBLEMA )PR0BREMA 
SA(BE 



XXX7XXX 



by 



/zzz/ 



15. Considerations for optical scanning. 

a. All spaces are identified by the / slash mark. 

vou ser o chefe /VOU/SER/O/CHEFE/ 

vou ajuda-la /VOU/AJUDA ' -/LA/ 

b. All incomplete words will be broken at the end of the 
line, without a hyphen or space. 

c. A blob symbol H must precede all low level characters 
appearing in the first position of a line. 

Low level characters: 

11. # 12. = 27. ' 28. , 32. + 48. - 59. 

d. Dialogs may be carried over from one page to the next. 
The first line on every page gives the pertinent in- 
formation for the hard copy only (tape number, date, 
source, etc.) and carries the delete symbols. The in- 
formant number on the following page does not appear 
until the informant changes. 

e. The line delete symbol at the end of the line is 333 . 
At the beginning of a line deletion is accomplished by 
striking over the first three characters with the 
upper case N letter. 

■R-ttF/DE/ JANEIRO 

f. The first line on each page must be at least seventeen 
characters long. 

g. The 6 delta symbol is used only for colloquy headings 
and precedes the informant number. It is located on 
the line immediately above a new paragraph. 

A02fFS8b2 

16. Computer symbols. 

a. The following are used for sorting and do not print: - 

31. S 47. & 58. 6 62. ? 

b. The following symbols are not used: 

26. 1 29. T 42. F 

c. The period, comma, and question mark will print, but 
will not sort nor will they be listed in the KWIC. ./ »' 



SPECIAL PROVISIONS FOR TYPING LITERARY PORTUGUESE 



1. Every effort will be made to reproduce as closely as 
possible the language used by the various authors. For this 
reason it is necessary to add certain symbols: 



Exclamation mark 


/•/ 


Quotation 


/"/ 


Change of speaker 


/-/ 



These characters will be separated by spaces from other 
words. Also, to conform to the original texts, the colon 
and semi-colon will be used. : ; 



2. In the event the original text has a misspelled word or 
typographical error, the correct form only will be typed. 

3. The first line on each page will contain the author's 
name, title of the work, publishing house, and the date of 
the edition being used. 



4. Informant numbers will have first the delta £, symbol, 
then the three-character page number of the original text, 
next the two letter code for the author, and finally the 
three-character page number of the hard copy. Only the two 
letter author code will be used to obtain the speaker list - 
the number of different "speakers", in this case authors, 
who use each individual word of the corpus. A011AF001 



5. The paragraph symbol Will be used to indicate a new 
paragraph. It will not sort for a KWIC listing, but it will 
print. 

6. Future and Conditional forms: 

dir-se-ao * DIR-/SEV/-A+0 

ercontrar-se-iam ENC0NTRAR-/SE7/-IAM/ 



7. Special forms: 



D. Maria 
1 <? 

1920 
V.Ex.a 
N9 1 

a fala (speech) 
nos (knots) 

Dr. Ruas 



/D. /MARIA/ 

/1 .0/ (not zero) 
/1920/ 

/V.7EX.A/ 

/N.0/1/ 

/A/FALA7/ 

/NO ' S7/ 
/DR./RUASV/ 



V 



TAPE 33A SAO PAULO SP 18 MARCH 13b8 008FH-U 003MC 010MC OllFH YT333 
PAGE 133333 

LEMA/ NENHUMB/ , / AH7/ 7 . . . /EU/ FALEI/COM/MEU/ME ' DICO/ ANTES/ , / NA+O/TEVE /PROBL 
EMA/, /MAS/RONALDOB/TE VE/RUBE 'OLA/./ AH7/7.. ./E=LE/JA' /ESTA' /BOM/ , /PASSOU/ 
TUD0/IH7/7.. ./ APESAR / DE7DEB /TERMOS/TI DO/UMD /GRANDE /SUSTO/ . /MAS/NA/MESMA/ 
HORAB /EU/LIGUE I /PARA / O/MEU/ME ' DICOI / E/ E=L E / DISSE/ QUE/NA+O/ TINHA/ PROBLEMA 
/NENHUM/, /MESMO/QUE/EU/NA+O/TIVESSE /TIDO/ , /QUE/ NA+O/ TERI A/ PROBLEMA/ , /POR 
QUE/NO/SE ' TIMO/ME=S/ NA+O/TEM/MAIS/PROBLEMA/ NENHUM/ , / NENHUM/ „ /MASB /MESMO/ 
ASSIM/NO 'S/FICAMOS/MEIO/CHATEADOS/E/PREOCUPADOS/ ,/MAS/DEPOIS/FALEI/COM/O 
UTROS/ME ' DICOS/E/TODOSi/DISSERAM/A/MESMA/COISA/./ AI NDAI/MA IS/QUE /EU/ JA ' / 
TIVE/ E / NA+O /TEM/ AS SIMB/ PROBLEMA /MESMO/. /E=L E/ FICOU/ A/SEMA NA/INTE IRA/ EM/C 
ASA/ I H V/ V.. ./ (ESTO-/ESTOUROU/ DOMIN-/ NO/ OUTRO/ DOMINGO/, /QUER/DIZER/ ./HOJE 
/ JA'R/FAZ/MAIS/DE/UMA/SEMANA/./E/ACHO/QUE/0/ME'DICO/RECOMENDOU/QUEB/E=LE 
/FO=SSE/TRABALHAR/TERC, A-/ OU/QUARTA-/SO '/ , /AMANHA+/0U/DEP0ISB/./AH7/7.. . 
/ FICOU /CHEIO/DE/FICAR/EMB/ CAS A/, /DETESTAB/F ICAR/, /E/PRI NCI PALMEN TE /SEM/P 
ODER/ LER/NEM/FAZER/ NADA/PORQUE/A/VI STA/ES )TAVA/MUITO/IRRITADA/./MAS/FORA 
/IS SO/ NA+O/DEU/NENHUMA/OUTRA/ COMPLI CAC , A+O/ ,/ NA+O/ TEVE/PROBLEMAB / NENHUM/ 
B./NO' S/ JA'/ESTA'VAMOS/ MAI S/OU/MENOS/ESPER ANDO/, /PORQUE/ESTA'/UMA/EPIDEM 
IA/INCRI 'VEL/ DE/RUBE 'OLA/AQUI/ i /TA' / TODO/MUNDO/COM/ RUBE ' OL A/ . /£/ NO ' S/ J A ' 
/ES )TA ' VAMOS / IMAGINANDO/QUE / V. . . /QUE / E=LE/ NA+O/IR I A /SE7/L I VRAR/ DE SSA./_. / M 
ASB/NUN/TEM/MAIS/PROBLEMA/NENHUM/, / NUN/SE7/PRE0CUPEM/ ,/TA ' /TUDO/EM/ORDEM 
/. /EH V /7.../ DAISY/, /COMOB/ E ' /QUE/VOCE=/ESTA ' / , / J A ' / ENGORDOU/ ALGUMA /COISA 
/7/CE=/TA'B/PASSANDO/BEM/7/ADOREI/SABER/QUE/CE=/ , /QUE/TA' /TUDO/EM/ORDEM/ 
EB/TAL/./AHV/V.../EU/ < RE-/RECEBI/A/SUA/CARTA/E/ESS AB/QUE/ EU/MANDE I /A/SEM 
ANA/PASSADA/FOIB/RESPONDENDO/A/SUA/ , /IHV/7.../FALANDO/SO=BRE/AS/COISAS/D 
0/BEBE=/E/TALB/QUE/ALIA'S/E ' /O/ASSUNTO/DA/MODA/PRA/ NO 'S/ , / NUN/E ' / ? /CARLO 
S7EDUARD0/ , /COMO/E ' /QUE / VA+O/ AS/COI S AS/POR/ AI ' B/ 7 /O/TRABALHO/E/ OS / PREPAR 
ATIVOS/DE/VOCE=S/TAMBE , M/7/ACHEI/ESPETACULARB/E/<VO-/ESPETACULAR/VOCE=S/ 
IREM/A7A&/MA'LAGA/E/A#/ALDEIA/D0/V0V0=/E/VISITAR/A/FAMI ' L I A /TO=DA / . /ACHE 
IB/UMA/ IDE ' I A/ 0 ' TIMA/ . /MORRI /DE/INVE J AB/ , /ACHEI/MUIT0/BACANA/./AH7/7. . . / 
AH7/7. . ./BOM/ , /VAMOS/VER/SE/CE=S/OUVIRAM/./ CARL0S7EDUARD0/E /DAISY / FALEM/ 
UM/POUCO/./PRA/DEPOIS/EU/FALARB/MAIS/./ 

A010MC133 

MUITO/ BEM/ESCUTA DO /./CEM/POR /CENTO/,/ BEATRI Z/ ,/0#TIMAMENTE/ESCUTADO/E/QU 
E/SUSTO/E=SSE/NEGO'CIO/DE/DOENC,A/, /QUE /NA+O /SABI ' AMOS/NADA/./NA+O/CHEGO 
UB/A/SUA/CARTA/DA/U ' L TIMA/S EMANA/ , /NA+O/ . /TALVEZ/7. ../TALVEZ/, /ENTA+O/ , / 
CHEGUE/AMANHA+/ . /ENTA+O/ EU/QUERO/QUE7QUE/0/MICROFONE/VA ' / PARA/0/CARL0S7E 
DUARDO/./B 
ADD3MC133 

E'/SO '/EU/AMEAC, AR/DEB/FALAR /COMEC , A /A&/DAR /CRAMPE / AQUI/ ./FALAM/TANTO/QU 
EB/QUANDO/EU/COMEC,0/A&/FALAR/DA'/CRAMPE/./BEM/JA' /QUE/A/MINHA/VOZB/E ' /Q 
UE/ESTA' /SENDO/ (PRE-/PRECISA/SER/OUVIDA/AI ' / , /EU/QUE / NA+O/ F ALO/ , / EU/QUE/ 
SOU/CHATO/, /VOU/FALAR/ POR/ TODO/MUNDO/HOJE/ AQUI/. /A /DAISY/ TA' /BOAB/ , /PASS 
ANDO/MUITO/BEM/ ,/TA'i/0'TIMA/, /NA+O/ ENGORDOU/ , /TA' /DISFARC, ANDO/DIREITIN 
H0B/AINDA/IH7/7.../VAI/IND0/MUIT0/BEM/./RECEBEU/0/CARTA+0/DE/ANIVERSA'RI 
0/, /TA' /AGRADECENDOB/ , /MANDANDO/UM/A BRAC , 0/PRA/V0CE=/ . /SABE/QUE/EU/ESTOU 
/ (CAN-/ FICO/QUIETINHO/ AQUI /MAS/ EST0U/0H7/7. . . /LENDO/TO=DAS / AS/ SUAS/ CARTA 
S /COMB/ MUITO /PRAZER/ , / MUITO/ GO STOSO/ » /IH7/7. . . /APROVEITANDO/ , /SABENDO/TO 
B=DAS/AS/ NOTI ' CIAS/E/ ACOMPANHANDO/TUDO/ . / TAMBE 'M/ESTOU/COM/MUITAS/SAUDAD 
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RUBEM BRAGA. AI DE Tit COPACABANA. RIO. EDITORA DO AUTOR . PP 37— b7DDD 
IRbO, PAGE 04-R3DD 
AD37RB0f 3 

UM/TELEFONEMA/BAPENAS/CORDI AL/ , /A&/QUE/ATE NDOI/COM/ NATURAL I DADE /-/MAS/ PO 
RQUE/ t / DEPOIS / , /E=SSE/INDEFINI 'VEL /TREMOR /I ' NTIMOi/ , /ESSA/REMOTA/ NOC, A+O 
/.DE/QUEI/REPRE SENTE I / UMA/CE NA/SOBB/O/EFEITO/DO/HI P NOTISMO / . / E = SSE/INDIZI 
■ 'VEL /SUSTOI/7 / SOU/UM/HOMEM/TRANQUI LO/, / E/MINHA/VIDAI/ESTA ' /TRANQUILA/ i / 
OUC * 0 / ESSA/ VOZ / i /E=SSE/NOME / t/EI/PRONTO/' / — /COMEC » OI/A&/AG IR/COMO/SE/EUB 
/TRAB ALHASSE I /EM/UM / F ILMEI/A&/QUE/EU/MESM01/ESTIVESSEI/ASSISTINDO/./REPR 
ESENTO/MEU/PAPELI/DE/MANEIRA I /NORMAL /E/FAC » OI/O/PAP EL/DE/ UM/HOM EM/ NORMAL 
/i/MAS/HA'B/UM/OUTRO/EU/INVISI'VELI/QUE/E ' / AQUALOUCO/ . /P ATINADORi/SO=BRE 
/ARCO-I 'RISB/ , /MENINO/TONTO/ , / HAML ET/ , /PAL E RMA/ , /P ATE' TI CO/ ./ENQUANTO/EU 
/DIGO /UMA/COIS A/SENS ATA/E=SS E /MEU/ F ANTASMA / SE7/ENTREGA/ A&/ UM/SI L ENCIOSO/ 
DESVARIO/,/OU/RECITAi/VERSOS/ANTIGOS/,/VOA/COMO/UM/ANJO/, /SOLUC, A/./POSS 
O/CONTEMPLA' -/LO/COM/FRIEZA/ , /CRITI CA ' -/LO/ , /TER/P ENA/DE = L El/ i / E VI TO/QUE 
/E=LE I / INFLUA/ NO/MAI S/MI ' NIMOI/EM/MI NHA/CONDUTAI/R EALI/ » /QUANDO/E = LE/TEM 
/UM/IMPULSO/DE/FALAR/AO/TELEFONE/E U/ME7/PONHO/TRANQUILAMENTE/A&/DESCASCA 
RH/UMA/LARAN J AI/OU/F AZER/PONTA/EM/ UM/ LA' PIS B/» / E/S EM/MI NHASB/MA+OS/ i / SEM 
/MEUH /CORPO/ t /E = LEI/ NA+O/PODE / FAZER/NADA/ . /RESOLVO/IGNORA ' - /LO / E/CHEGOI/ 
A&/ESQUECE=-/LO/DUR ANTE/SEMANAS / 1 / MESES/ » /MAS/QUA NDO/SURGE /A/PRE S ENC,A7/ 
E=LED /SALTA/ AO/MEUI/ L ADO/, /SOB/UMA/ LUZ/SOBRENATUR ALB/,/ ABSURDO/E/INFANTI 
L/./Tt/ 

A038RB04 R 

NA+O/ ESTOUI/APAIXONADOI / i / MEU/COME * R CIO! /SENTIMENTAL /COM/ AS/OUTR AS/CRI AT 
UR AS /COR RE/NOR MALI/ /COM/SUAS/ALEGRIASI/E/TRISTEZASI/./NA+O/ESTOU/APAIXO 
NADOB/ , / MAS/POS SO/ VER /A/FACE/ DA/PAI XA+O/. /E/POR/UM/INSTANTE/FICOi/PARADO 
/i/MUDO/i/COMO/QUEM/OUVISSE/ » /NO/F UNDO/DA/ NOITE/ , /0/SUSSURRO/DAS/ESTRE=L 
AS/. /E /07/RE CON HECESSE/./H/ 

A03RRB050 

ANTO=N I07MAR I AB/BCONTOU/QUE/ UMA/VEZ / IA/NUM/TA' XI / GUIADOB/P OR/UM/ CHOFERB/ 
PORTUGUE = SB/VELHO/i/BIGODUDOB/i/CALADO/i /DE/CARA/TRISTE / . /QUANDO/O/CARRO 

/chegoui/a#/praia/o/chofer/viu7/um/barco/e/exclam’oub/./apontando/com/o/b 

RACiOB/ESTICADOB/./OS/OLHOS/ BRILHANTES / , / N UM/TOM/DE/DESCOBERTAB/ , /DESAFI 
OB/E/ALEGRIA/:/ t il/-/OLHA/0/NAVIO/PEQUENINOB/ , /TI/ESSA/FASCINACiA+0/DOS/POR 
TUGUE = SES/PEL OS/NAVI OS/ME/S ALVOU/ A/ TARDE/DE/ONTEM/ . /EU/TINHAI/DE/ IR/A#/A 
LFA=ND EGA/E/ . / PORTANTO/ , /PASSAR/PEL A/PRAC . AB/MAUA ' / ./0/PORTUGUE = SB/DO/VO 
LANTEi/VINHA/PRAGUEJ ANDO/CONTRAI/O/CALOR/ . / CONTRA/ OS/OUTROS/CARROS/, /CON 
TRAB/TUDO/. / ANTES/D E = LE/EU/ VI / 0/"/V ERA7CRUZ/ "/ENCOSTADO/NO/CAIS/ . /E/DISS 
El / : / " / OLHEB / O/VERAVCRUZ/" / . /QUE/N AVIO/BON ITO/ ' /" /E = LE/BRE CEBEU/ ISSO/COM 
OD/UM/ELOGIO/PESSOAL/E /COMEC . OU/A&/FALAR/DO/NAVIO/COM/ENTUSIASMO/ . /ATE' / 
CONHE C IA/UM/MAQUINISTA/DE/BORDOI/E /VISI TARA/ TODO/O/ GIGANTE ■/:/" / TEM/OITO 
/AND ARES/, /MAS /TEMB/ELEVADOR/ ' /"/TI/ 

AD4DRBOf R 

PELASB/CINCO/E/POUCO / . / AO/ VOL TAR/ PAR A/CAS A / , /ME/TO COU/OUTRO/VOL ANTE/PORT 
UGUE = SB / . /NA/ ALTURA/ DO/FLAM E N GO/DI VI SEI/O/ NAVI OB / » /QUE/MARCHAVA/P ARA/A/S 
AI' DA/ DA/BARR A/. /E/RE SOLVI/ELOGIAR/ NOVAMENTE /O/BARCOI/. /PARA/VER/O/EFEIT 
OB/. / F OI7/MAR A VI LHOSO/. /I'VE • /REALMENTE/ , / E ' /REALMENTE/ . / E ' /UM/BELO/NAVI 
01/ ' / "/FIZ/NOTAR/QUE /O/BRASIL/NA+O/ TINHA/NENHUM/NAVIO/DE/P ASSAGE IROS/TA+ 
O/GRANDE/E/ TA+O/BONITOI/ . /E / ISSO/ANIMOUI/AI NDA/MAIS/O/HOMEM/ ./ACABOU/CON 
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SORT 



9000 REM PROGRAM-- SORT*** 

9001 REM 

9002 REM WRITTEN BY— DP3 WILLIAM TAKACS OCTOBER 1,1971 

9003 REM 

900 4 REM DESCRIPTION— - THIS PROGRAM WILL SORT ANY FILE OF ASCII 

9005 REM CHARACTERS INTO EITHER ASCENDING OR DESCENDING 

9006 REM ORDER ACCORDING TO ANY SIZE OR NUMBER OF 

9007 REM CHARACTER FIELDS. THERE IS MAXIMUM OF 

9008 REM FOUR INPUT FILES IN EACH SORT RUN. 

9009 REM 

9010 REM INSTRUCTIONS— SAVE AN EMPTY FILE FOR THE SORTED DATA. 

9011 REM 

9012 REM ANSWER THE QUESTIONS AS 

9013 REM THEY ARE ASKED BY THE COMPUTER. 

9014 REM MAIN PROGRAM - - - 

9015 LET Q2$="SORTEND***" 



9016 FILE #6:"*" 

9017 DIM Q (100) 

9018 PRINT "HOW MANY INPUT FILES"; 

9019 INPUT QO * . 

9020 PRINT 

9021 IF Q0. 4 THEN 9086 

9022 FOR Q5=l TO Q0 

9023 PRINT "WHAT IS INPUT FILENAME # ";Q5; 

9024 INPUT Ql$ 

9025 FILE #Q5:Q1$ 

9026 NEXT Q5 

9027 PRINT "WHAT FILE DO YOU WANT THE SORTED DATA WRITTEN INTO"; 

9028 INPUT Ql$ 

9029 PRINT ' 

9030 FILE #5:Q1$ 

9031 LET Q9=l 

9032 PRINT "DO YOU WANT TO SORT INTO ’ASCENDING* OR 'DESCENDING* ORDER" 

9033 INPUT Ql$ 

9034 PRINT 

9035 CHANGE Ql$ TO Q 

9036 IF Q(l) =65 THEN 9038 

9037 LET Q9=-l 

9038 PRINT "HOW MANY FIELDS DO YOU WANT TO SORT ON"; 

9039 INPUT Q4 

9040 PRINT 

9041 WRITE #6: Q0,O,Q4,l, 1,1,0 

9042 PRINT "FOR EACH SORT FIELD— STARTING WITH THE FIRST FIELD" 

9043 PRINT "TO BE SORTED ON, THEN THE NEXT, AND SO ON — TYPE THE" 

9044 PRINT "CHARACTER POSITIONS OF THE SORT FIELD (E.G. , TO SPECIFY A" 

9045 PRINT "SORT FIELD OF CHARACTERS 3,4,5,46, YOU TYPE — • 3-6)" 

9046 PRINT ;‘V 

9047 FOR Q5=l TO Q4 

9048 PRINT "SORT FIELD I ";Q5; 

9049 LINPUT Ql$ 



SORT 
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9050 CHANGE Ql$ TO Q 

9051 LET Q7=-l 

9052 LET Q8=0 

9053 FOR Q2=l TO Q(0) " \ 

9054 IF Q (Q2) =32 THEN 9060 ; . . * 

9055 IF Q(Q2)=45 THEN 9063 . V 

90 56 LET Q6=Q(Q2)-48 

9057 IF Q6 , 0 THEN 9076 

9058 IF Q6.9 THEN 9076 

9059 LET Q8=10*Q8+Q6 O' 

90 60 NEXT Q2 : — : : — . J " O v 'O: 

9061 REM . 

9062 GOTO 9066 

9063 LET Q7=Q8 ' 

9064 LET Q8=0 

9065 GOTO 9060 

9066 IF Q7.-1 THEN 9068 
90 67 LET Q7=Q8 

9068 IF Q7.Q8 THEN 9083 

9069 WRITE #6j 1, Q7, Q8 ,Q9 , 0, 0 , 0, 0 
90 70 NEXT Q5 

9071 ON QO GOTO 9072,9073,9074,9075 

9072 CHAIN Q2$ SYSTEM "SORT" WITH #6,#1,#5 

9073 CHAIN Q2$ SYSTEM "SORT" WITH #6,#1,#2,#5 

9074 CHAIN Q2$ SYSTEM "SORT" WITH #6 , # 1 , #2, #3, # 5 

9075 CHAIN Q2$ SYSTEM "SORT" WITH #6 ,#1 , #2, #3, #4 ,#5 

90 76 IF Q (Q2) =44 THEN 9080 

9077 PRINT " A RANGE OF WHOLE NUMBERS IS THE ONLY VALID RESPONSE " 
90 78 PRINT " TO THIS QUESTION. RE-TYPE YOUR RESPONSE CORRECTLYY" 

9079 GOTO 9048 

9080 PRINT " PLEASE TYPE ONLY ONE RANGE OF NUMBERS PER FIELD." 

90 81 PRINT " RE-TYPE YOUR RESPONSE CORRECTLY." 

9082 GOTO 9048 

9083 PRINT " THE RANGE OF CHARACTERS MUST BE SPECIFIED IN ASCENDING 

9084 PRINT " ORDER. PLEASE RE-TYPE YOUR RESPONSE CORRECTLY." 

9085 GOTO 9048 

9086 PRINT "THIS PROGRAM ALLOWS FOR A MAXIMUM OF FOUR INPUT FILES." 

9087 PRINT "CONTACT THE PROGRAMMING ASSISTANT AT EXT. 2185 FOR" 

9088 PRINT "INSTRUCTIONS ON SORTING DATA CONTAINED IN MORE THAN" 

9089 PRINT "FOUR FILES." 

9090 END 
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10 REM PROGRAM — 
15 REM 

20 REM DESCRIPTION 



25 REM 
40 REM 
60 PRINT 
65 END 



SORTEND*** 

— THIS PROGRAM DOES THE WRAP-UP PROCEDURES 
FOR THE BASIC PROGRAM SORT***. 
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LOCATE 



100 • 

110 1 —Description — - — — 

120 ' 

130 1 Written by Carl Tannenbaum and Edward Rippon for 

140 ' Prof. J. A. Hutchins. This program will locate unique 

150 ' strings in any file, transferring them to a separate 

160 ' file named LOCA which must be SAVED before running 

170 1 the program. 

180 ' 

190 • PROGRAM — — — 



200 DIM Z(100) 

210 PRINT "INPUT FILE"* 

220 INPUT F$ 

230 FILE #1:F$ 

240 FILE #2: "LOCA" 

250 SCRATCH #2 
260 MARGIN #2:130 

270 PRINT "ARE YOU USING A REGULAR TERMINAL”* 
280 INPUT C$ 

290 IF C$ = "YES” THEN 320 

300 MARGIN 130 

310 GO TO 330 

320 MARGIN 70 

330 LET Zl=l 

340 PRINT "WHICH STRING DO YOU WISH TO LOCATE" 
350 PRINT 
360 INPUT A$ 

370 LET C=0 

380 RESET #1 .* 

390 IF END #1 THEN 510 / 

400 LINPUT #1:B$ 

410 LET N=1 

420 LET S = POS (B$,A$ ,N) 

430 IF S=0 THEN 390 

440 IF S— 1+LEN (A$) .-LEN (B$) THEN 390 

450 IF A$ , . SEG$ (B$ , S , S+LEN ( A$ ) *1 ) THEN 490 

460 PRINT #2 : B$ 

470 LET C = C+l 
480 GO TO 390 
490 LET N = S + 1 
500 GO TO 420 
510 PRINT 
520 PRINT 

530 PRINT "THERE WERE" *C * "MATCHES FOUND.” 

540 PRINT ; , ( . 

550 PRINT 

560 PRINT #2: , 

570 PRINT #2: ■: 

580 RESET #2 “ * 

590 LET Z(Z1)-C + 2 



LOCATE (continued) 



600 LET T9=0 

610 IF Zl=l THEN 680 

620 FOR I « 1 TO (Zl-1) , : 

630 LET T9»T9+Z (I) 

640 NEXT I 

650 FOR J = 1 TO T9 

660 LINPUT #2:D$ 

670 NEXT J 

680 FOR K = 1 TO Z(Z1) • 

690 LINPUT #2:A$ 

700 PRINT A$ 

710 NEXT K 
720 LET Z1=Z1+1 
730 PRINT 

740 PRINT "DO YOU WISH TO LOCATE ANOTHER STRING" | 
750 INPUT Y$ 

760 IF Y$*"YES" THEN 340 
770 END 
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100 REM WRITTEN BY J.W. SCHWAB FOR PROF. JOHN A. HUTCHINS 
110 REM 
120 REM 

130 REM DESCRIPTION- THIS PROGRAM WILL MOVE ANY NUMBER OF 



140 REM VERTICAL COLUMNS TO ANY DESIRED POSITION. 

150 REM THE COLUMNS OF THE INPUT FILE MUST BE 

160 REM SEPARATED BY THE CHARACTER "±". THE 

170 REM MARGIN IS SET FOR THE IBM 2741, AND 

180 REM SHOULD BE CHANGED IF THIS PROGRAM IS 

190 REM RUN ON A STANDARD TELETYPE TERMINAL. 

200 REM 
210 REM 

220 REM PROGRAM 



230 DIM 1$ (130) ,J$ ( 130) ,F( 130 ) ,0( 130) ,P (130) ,T(130) 

240 PRINT "WHICH FILE DO YOU WANT REARRANGED"; 

250 LINPUT A$ 

260 PRINT "IN WHAT FILE DO YOU WANT RESULTS TO BE STORED"; 

270 LINPUT B$ - • , • . . 

280 FILE #1:A$ 

290 FILE #2 :B$ 

300 SCRATCH #2 
310 MARGIN #2:130 

320 PRINT "HOW MANY VERTICAL COLUMNS (FIELDS) , SEPARATED BY" 

330 -PRINT "THE (±) , ARE THERE IN THE DATA FILE"; 

340 INPUT N 

350 PRINT "IDENTIFY THE" ;N; "FIELDS FROM LEFT TO RIGHT." 

360 FOR B=1 TO N 
370 INPUT I$(B), 

380 NEXT B 

390 PRINT "IDENTIFY WHICH COLUMN YOU WANT ON THE LEFT, NEXT, NEXT," 

400 PRINT "NEXT, ETC. USE EXACT TITLES FOR EACH VARIABLE AND" 

410 PRINT "AFTER EACH, FOLLOWED BY A COMMA , INDICATE THE TAB MARKER" 
420 PRINT "FOR EACH COLUMN." 

430 FOR B=1 TO N 
440 INPUT J$(B),T(B), 

450 NEXT B 

460 LET C=1 ' 

470 FOR B=1 TO N 

480 IF I$(B),.J$(C) THEN 530 

490 LET F(C)=B 

500 LET O(C) =C 

510 LET C=C+1 : . ; . 

520 GO TO 470 

530 NEXT B 

540 FOR B=1 TO N 

550 NEXT B r T ' : ■ 

560 LINPUT #1:A$ \ 

570 LET E=1 

580 FOR B=1 TO N-l 

590 LET P(B)-POS (A$,"±",E) 
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MULTIMUV (continued) 

600 LET E=P (B) +2 
610 NEXT B 

620 LET P(B+1)oLEN(A$)+2 

630 LET C=1 

640 FOR B=1 TO N 

650 LET B$(B)=SEG$(A$,C,P(B)-1) 

660 LET C=2+P (B) 

670 NEXT B 

680 FOR B=1 TO N 

690 PRINT #2:TAB(T(B) ) ;B$ (F(B) ) ; 

700 NEXT B 

710 PRINT #2; : 

720 IF MORE #1 THEN 560 
730 PRINT 
740 PRINT 

750 PRINT "SUPER-MOVER HAS COMPLETED ITS TASK^^l*" ! 
760 END 
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NOTAB 



100 REM WRITTEN BY J.W. SCHWAB FOR PROF. JOHN A. HUTCHINS 
110 REM 
120 REM 

130 REM DESCRIPTION- THIS PROGRAM WILL MOVE ANY NUMBER OF 



140 REM VERTICAL COLUMNS TO ANY DESIRED POSITION 

150 REM THE COLUMNS OF THE INPUT FILE MUST BE 

160 REM SEPARATED BY THE CHARACTER "±" . THE 

170 REM MARGIN IS SET FOR THE IBM 2741, AND 

180 REM SHOULD BE CHANGED IF THIS PROGRAM IS 

190 REM RUN ON A STANDARD TELETYPE TERMINAL. 

200 REM TO INDICATE TAB POSITIONS FOR EACH 

210 REM COLUMN, USE MULTIMUV PROGRAM. 

220 REM 
230 REM 

240 REM PROGRAM — — — - 



250 DIM I$(130) , J$ ( 130 ) ,F ( 130 ) ,0(130) ,P ( 130 ) 

260 PRINT "WHICH FILE DO YOU WANT REARRANGED" ; 

270 LINPUT A$ 

280 PRINT "IN WHAT FILE DO YOU WANT RESULTS TO BE STORED"; 

290 LINPUT B$ 

300 FILE #1:A$ 

310 FILE #2:B$ 

320 SCRATCH #2 
330 MARGIN #2; 130 

340 PRINT "HOW MANY VERTICAL COLUMNS (FIELDS) , SEPARATED BY" 

350 PRINT "THE (±), ARE THERE IN THE DATA FILE"; 

360 INPUT N 

370 PRINT "IDENTIFY THE" ;N; "FIELDS FROM LEFT TO RIGHT." 

380 FOR B=1 TO N 

390 INPUT I$(B), / 

400 NEXT B 

410 PRINT "IDENTIFY WHICH COLUMN YOU WANT ON THE LEFT, NEXT, NEXT 

420 PRINT "NEXT, ETC. USE EXACT TITLES FOR EACH VARIABLE." 

430 FOR B=1 TO N 
440 INPUT J$(B) , 

450 NEXT B 
460 LET C=1 

470 FOR B=1 TO N : . 

480 IF I $ (B) ,.J$(C) THEN 530 
490 LET F(C)=B 

500 LET 0(C)=C * . . 

510 LET C=C+1 
520 GO TO 470 

530 NEXT B ' : ■' ; 

540 FOR B=1 TO N ' . V. W/.,- 

550 NEXT B ■ . • . 

560 LINPUT #1:A$ . . W ^ 

570 let-e=i~ : ' • . , .. ‘ : 

580 FOR B=1 TO N-l 

590 LET P (B) -POS (A$,"±",E) 





NOTAB (continued) 



600 LET E=P(B) +2 
610 NEXT B 

620 LET P(B+l)«LEN(A$)+2 
630 LET C=1 
640 FOR B=1 TO N 
650 LET B$(B)=SEG$(A$,C,P(B)-1) 

660 LET C=2+P (B) 

670 NEXT B 
680 FOR B=1 TO N 
690 PRINT #2:B$(F(B) ) ; 

700 NEXT B 
710 PRINT #2: 

720 IF MORE #1 THEN 560 
730 PRINT 
740 PRINT 

750 PRINT "SUPER-MOVER HAS COMPLETED IT'S TASRlrt%yrt*j 
760 END 



RITEJUST 



100 REM: Written by Carl Tannenbaum 

110 REM: 

120 REM: DESCRIPTION! This program will right justify 

130 REM: a column of numbers that is left 

140 REM: justified, provided that spaces 

150 REM: precede and follow the column. 

160 REM: Tabs positions can be the same as 

170 REM: those used in MULTIMUV. The relative 

180 REM: position of the column does not 

190 REM: change between input and output file. 

200 REM: 

210 REM: MAIN PROGRAM — 

220 REM: 



230 PRINT "WHAT PILE HAS THE COLUMN YOU WANT RIGHT JUSTIFIED". 

240 INPUT A$ 

250 FILE #1:A$ 

260 PRINT "IN WHICH FILE ARE THE RESULTS TO BE STORED" ; 

270 INPUT B$ 

280 FILE #2:B$ 

290 SCRATCH #2 

300 MARGIN #2:130 

301 PRINT 

310 PRINT "TAB POSITION WHERE LEFT JUSTIFIED NUMBER COLUMN BEGINS" 
320 INPUT C 

330 PRINT "Tab position where next column to the right begins"; 

340 PRINT 
350 INPUT C2 
360 LET C=C+1 

370 PRINT "Tab position immediately after largest left justified" 
380 PRINT "number"; ' • 

390 INPUT D 

400 LET D = D + 1 • r' 

410 LINPUT #1:A$ ■ 



430 LET P=P0S(A$V" %C) - 

440 LET N$=SEG$(A$,C,P-1) 

450 FOR 1=1 TO D-C-LEN (N$) 

460 LET N$=" "&N$ 

470 NEXT I 

480 PRINT #2: SEG$ (A$ , 1,C-1) ;N$;TAB (C2-1) ;SEG$ (A$,C2,LEN(A$) ) 
490 IF MORE #1 THEN 410 
500 PRINT 

510 PRINT "Now go to output file" . 1 . 

520 END ' 



DESCRIPTION OF KEY-WORD-IN-CONTEXT PROGRAM (KWIC) 



The user wishing to produce a concordance will first 
choose his computer terminal and then his typing element. 

Any language that can be keyboarded from a typewriter and 
for which a typing element exists can be computer processed 
in upper and lower case with the respective diacritical 
marks. The print-out will be in perfect alphabetical order. 
The typing element may give some problem. The Portuguese 
element is IBM BR-971 which unfortunately has no number one 
separated from lower case L (1). The problem is resolved by 
a special output program which converts all number ones to 
lower case L. However, it is necessary to input the symbol 
= from the 2741 IBM terminal. Also, the question mark (?) 
presents a problem since on the BR-971 element it is on 
upper case 6 or key 19 which on the 2741 is the control 
input. The problem is resolved by first typing the question 
mark (?) and then the nul character, upper case key 41 or 
+ on element BR-971, The Brazilian or Portuguese element 
will work equally well for French. Most of the elements for 
French are twelve pitch and they also have * on the nul key 
position. For Spanish, the Puerto Rican element, No. 040 or 
part no. 1167040, presents no problems. There is a separate 
number one with the reverse question and exclamation marks. 
The Bulgarian typing element has been used for Russian, but 
it is understood that IBM has recently come out with three 
new elements with the Cyrillic alphabet. 

With the special typing element the user prepares a 
file named after the language, using only the first four 
letters in caps. This file has three lines exactly. The 
lines are not sequenced-numbered. The first line contains 
the capital or upper-case letters of the alphabet in the 
desired order for sorting. The lower-case letters are on 
the second line in their sort-order. The third line (no 
spaces between the lines) has those diacritical marks which 
require the use of the back-space before typing them. .The 
accent marks are also arranged in "dictionary" order. Also, 
symbols used for sorts are placed in this line and they may 
or may not appear in the print-out. 

Next a languages file is prepared on the same terminal 
using the same typing element. Any name up to eight 
characters may be used - the computer program will ask for 
its name. The lines of the file are sequenced-numbered. Each 
new source is labeled on a line preceding the input material. 
Each label has eight characters, namely, three digits, two 
letters, and three digits. This label provides for 
identifying the source, its classification, and the page of 
computer input for later reference. In the input material » 
of this file, the sequence for separable diacritical marks 
is letter, backspace, and diacritical mark. 



Blanks have meaning. Each sequence-number must be 
followed by at least one blank. Capitalized words which are 
not proper names are to be preceded by at least two spaces, 
to show that the capitalization arises from being at the 
start of a sentence, or the like. Proper names at the 
beginning of sentences will have only one space before 
them, though this is usually wrong 3tyle. 

In addition to the input file, two others will be 
needed. An intermediate-results file may be called KWIC or 
any other name. A file for storing final results will also 
be necessary. The contents of the KWIC file are of little 
importance since it gets scratched and changed. 

Next the user chooses his L and R positions. These are 
the number of characters of context to the left and 
respectively to the right of the index point of the 
character of the sort field or the KEY word. The total of 
values of L and R can not exceed 112. In the programs 
FOR- KWIC, FOR-LIST , REV-KWIC, and REV-LIST L has been set at 
fifty-five and R at fifty-seven in the first two statements. 
The user should"""change these to suit his own terminal and 
taste. L need not be equal to R, but FOR-KWIC must agree 
with FOR-LIST , and REV-KWIC rfiust agree with REV-LIST. 

To begin the concordance program the user calls for 
OLD FOR-KWIC and keyboards RUN. The computer will ask for 
the input file. For the languages only the first four 
letters are to be written. Any intermediate file may be 
used. The intermediate file must next be sorted and this is 
easily done by running the program SHELGAME . Finally, the 
user calls for OLD FOR-LISTS and types RUN. The desired 
concordance can be obtained by going to the storage file. 

All this is assuming that a forward concordance is desired. 
For a backward concordance use REV-KWIC and REV-LIST. 

The programs FOR-LIST and REV-LIST assume that the 
intermediate file is called KWIC. If some other name is 
used, the user will, of course, make this change in his file 
statement. 
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SUBPROGRAMS FOR ARRANGING LETTERS IN ALPHABETICAL ORDER 



The alphabetical order for each language is used, except 
for the ch,ll, and rr of Spanish which uses a special output 
program. The program name is the first four letters of each 
language. Extra features, such as tags, can be added to 
the third or diacritical mark line for special sorts. The 
final output program can convert them to blanks. 



FREN 



ABCQDEFGHIJKLMNOPQRSTUVWXYZ 
abcgdef ghi j klmnopqr s tuvwxy z 






PORT 



ABCQDEFGHIJKLMNOPQRSTUVWXYZ 
abcgdef ghi jklmnopqrstuvwxyz 
&* 



RUSS 

SPAN 



a6Brflex3H^KJiMHonpcTy$xumnmT>NB3©fl 

ABCDEFGHIJKLMNflOPQRSTUVWXYZ 
abcdef ghi j klmnnopqrstuvwxyz 
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SAMPLE TELETYPE INPUT FORMAT 
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VERIFY 
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100 REM WRITTEN BY HAROLD KAPLAN 
110 REM 

120 REM DESCRIPTION - THIS PROGRAM WILL COMPARE TWO FILES 
130 REM AND WILL PRINT OUT DISCREPANCIES 

140 REM FOUND. THIS IS TO CHECK ACCURACY OF 

150 REM INPUT DATA. . 

160 REM 

170 REM MAIN PROGRAM - 

180 PRINT "FIRST FILE"; 

190 INPUT A$ 

200 PRINT "SECOND FILE"; 

210 INPUT B$ 

220 FILE #1: A$ • . \ 

230 FILE #2:B$ 

240 LINPUT #1 :X$ • ' • y ■ 

250 LINPUT -#-2-sY$— — — : ' 

260 IF X$ = Y$ THEN 300 

270 PRINT "DISCREPANCY" " 

280 PRINT X$ * , ; : 

290 PRINT Y$ 

300 IF MORE #1 THEN 340 
310 IF MORE #2 THEN 370 
320 PRINT "ALL DONE" 

330 STOP ‘ 

340 IF MORE #2 THEN 240 
350 PRINT A$; " IS LONGER" 

360 STOP •' • 

370 PRINT B$; " IS LONGER" 

380 STOP I ' 

390 END 
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FOR-KWIC 
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10-0 REM PROGRAMMED BY PROF. HAROLD KAPLAN, U.S. NAVAL ACADEMY 
110 REM THIS MAKES THE CONTEXTS FOR FORWARD CONCORDANCES. IT 
120 REM CALLS FKEY AND FCAPS. DON'T FORGET TO CHANGE L AND R 
130 REM HERE AND IN FOR-LIST. 

140 LET L=55 i 

150 LET R=57 
160 LET L9$= H " 

170 FOR J=1 TO L . .. 

180 LET L9$=L9$&" " 

190 NEXT J 
200 LET R9$=" H 
210 FOR J=1 TO R 
220 LET R9$=R9$&" " 

230 NEXT J 

■240 PRINT "NAME OF INPUT FILE"; 

250 INPUT A$ 

260 PRINT "NAME OF LANGUAGE (First four letters only)" 

270 INPUT B$ 

280 PRINT "In which file are the results to be stored" 

290 INPUT C$ 

300 FILE #1:A$ 

310 FILE #2:B$ 

320 FILE #3:C$ '• , 

330 SCRATCH #3 ; 

340 MARGIN #3:4095 •.'{•• 

350 LINPUT #2 :B2$ • ' <*.- 

360 CHANGE B2$ TO B ? 

370 DIM B( 100)- - ■ ; • 

380 FOR J=1 TO B(0) / ' 

390 LET V (B ( J) ) =1 ’ ’ 

400 LET W(B(J) )=J , ' 

410 LET T (B ( J) ) = 1 
420 NEXT J 

430 DIM V (128) ,W(128) ,T(128) - . 

440 LINPUT #2:B2$ ' * 

450 CHANGE B2$ TO B • 

460 FOR J=1 TO B(0) 

470 LET V (B (J) ) =1 

480 LET W (B (J) )*=J > • 

490 NEXT J 
500 LINPUT #2:B2$ 

510 CHANGE B2$ TO B 

520 FOR J=1 TO B ( 0) ' r ; 

530 LET D(B(J))=J . ’ 

540 NEXT J ■ . . , 

550 DIM D (128) ; : • ’V ’\ ■ 

560 IF MORE #1 THEN 590 

570 PRINT "EMPTY INPUT FILE" 

580 STOP 

590 LINPUT #1:L2$ 
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FOR-KWIC (continued) 



600 LET L$=FNC$(L2$) 

610 IF FNL$(L$)="T" THEN 640 

620 PRINT "FIRST LINE IS NOT LABEL: "}L2$ 

630 STOP • i 

640 LET L3$=L$ 

650 LET L4$=L3$ 

660 IF LEN (L4$) . 0 THEN 690 

670 PRINT "DONE. Now RUN SHELGAME" 

680 STOP . ; 

690 LET C$="" '• • 

700 IF END #1 THEN' 780 

710 LINPUT #1:L2$ 

720 LET L$=FNC$(L2$) 

730 IF FNL$(L$)»"T" THEN 760 •' ' ' ' 

740 LET C$=C$&" "&L$ 

750 GO TO 700 
760 LET L3$=L$ 

770 GO TO 800 
780 LET L3$="" 

790 GO TO 800 

800 CHANGE L9$&C$&R9$ TO Z 

810 REM FOR REVERSE CONCORDANCE FOR J=1 TO Z(0)/2 

820 REM FOR REVERSE CONCORDANCE LET T9=Z(J) 

830 REM FOR REVERSE CONCORDANCE LET Z (J)=Z (Z (0)+l-j) 

840 REM FOR REVERSE CONCORDANCE LET Z (Z (0 ) +1-J) =T9 

850 REM FOR REVERSE CONCORDANCE NEXT J 

860 REM FOR REVERSE CONCORDANCE ; FOR J=2 TO Z(0)-1 

870 REM FOR REVERSE CONCORDANCE IF Z(J),.8 THEN 830 

880 REM FOR REVERSE CONCORDANCE LET T9=Z ( J+l) 

890 REM FOR REVERSE CONCORDANCE LET Z (J+l) »Z ( J-l) 

900 REM FOR REVERSE CONCORDANCE LET Z(J-1)=T9 

910 REM FOR REVERSE CONCORDANCE NEXT J 

920 FOR J=1 TO Z(0) 

930 LET Y ( J) =ASC (B) +V(Z (J) ) * (ASC (N)-ASC(B) ) 

940 NEXT J 

950 LET Y(0)=Z(0) 

960 CHANGE Z TO Z$ 

970 CHANGE Y TO Y$ 

980 DIM Z (4095) ,Y(4095) 

990 LET P=0 

1000 LET P=POS(Y$,"N",P) 

1010 IF P=0 THEN 1110 

1020 CALL "FKEY":Z() ,W() ,T() ,D() ,P,K$,Q 
1030 LIBRARY "FKEY" , "FCAPS" , 

1040 PRINT #3:K$&L4$&SEG$(Z$,P-L,P+R)fiSTR$(Q) 

1050 LET P=POS(Y$,"B",P) 

1060 IF P=0 THEN 1110 ' , 

1070 IF Z(P),.8 THEN 1100 
1080 LET P=P+2 
1090 GO TO 1050 



FOR-KWIC (continued) 

1100 GO TO 1000 
1110 GO TO 650 
1120 DEF FNC$(A$) 

1130 LET A=POS (A $ , " " , 1) 

1140 REM LET A=0 

1150 LET FNC$=SEG$(A$,A+1,LEN(A$)) ' 

1160 FNEUD 
1170 DEF FNL$(A$) 

1180 IF LEN(A$),.8 THEN 1260 
1190 LET A=POS (A $ , n ",1) 

1200 IF A. 0 THEN 1260 
1210 FOR J=1 TO 8 

12^0 IF SEG$(A$,J,J) .«CHR$(97) THEN 1260 

1230 NEXT J 

1240 LET FNL$="T" 

1250 GO TO 1270 
1260 LET FNL$="F" 

1270 FNEND * , 

1280 END 
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FKEY 



100 

110 

120 

130 

140 

150 

160 

170 

180 

190 

200 

210 

220 

230 

240 

250 

260 

270 

280 

290 

300 

310 

320 

330 

340 

350 

360 

370 

380 

390 

400 

410 

420 

430 

440 

450 

460 

470 

480 

490 

500 

510 

520 

530 

540 



REM PROGRAMMED BY HAROLD KAPLAN 

REM THIS GENERATES THE SORT- KEYS FOR FORWARD CONCORDANCES, 
REM K$ IS THE KEY, AND Q K$ IS THE LENGTH OF THE WORD FOUND, 
REM FKEY IS CALLED BY FOR-KWIC. 

SUB "FKEY" :Z() ,W() ,T() ,D() ,P,K$,Q 
CALL "FCAPS" : Z () ,P, (ASC (a) -ASC (A) ) 

LET F$= n AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA M 
LET J=1 

FOR I=l+P-1 TO 30+P-l 
IF Z (I) =8 THEN 260 
IF W(Z(I) )=0 THEN 300 

LET A(J)=W(Z(I) )+64 ’LETTER * : ? 

LET B(J)=64 ’DIACRITICAL MARK 
LET C(J)=T(Z (I) ) +64 ’CASE 

LET J=J+1 •• "i / 

GO TO 280 • • 

LET B( J-l) =D(Z (1+1) ) +64 

let 1 = 1+1 . 

next I ' ’ 

LET 1=1+1 y 

LET A(0)=B(0)=C(0)=J-1 
LET Q=I— ( 1+P-l) 

CHANGE A TO A$ • , . . 

CHANGE B TO B$ : - ; 

CHANGE C TO C$ 

LET K$=FNS$ (A$) &FNS$ (B$) &FNS$ (C$) 

LET J=1 . • 

FOR 12=1 TO 30+P-l 
IF Z (I2) = 8 THEN 440 * 

LET A(J)=W(Z (12) ) +64 ’LETTER 
LET B(J)=64 ’DIACRITICAL MARK 
LET C(J)=T(Z(I2) ) +64 ‘CASE 
LET J=J+1 
GO TO 460 

LET B (J-1)=D (Z (12+1) )+64 
LET 12=12+1 
NEXT 12 

LET A(0)=B(0)=C(0)=J-1 ' . . . 

CHANGE A TO A$ . . . : . *. 

CHANGE B TO B$ 

CHANGE C TO C$ 

LET K$=K$&FNS$ (A$) &FNS$ (B$)&FNS$ (C$) ; 

DIM A(30) ,B(30) ,C (30) :!v ' 

DEF FNS$ (X$ ) «SEG$ (X$4F$ , 1 , 15 ) , A 

SUBEND / V 
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FCAPS 

100 REM PROGRAMMED BY HAROLD KAPLAN 

110 REM THIS CHECKS FOR CAPITALS AT THE BEGINNINGS OF SENTENCES 
120 REM BY LOOKING FOR THE SPACES BEFORE. IT IS NEEDED BY FOR-KWIC 
130 REM AND FOR-LIST. 

140 SUB "FCAPS" s Z () ,P,D 
150 IF Z (P-2) , .ASC ( ) THEN 180 
160 IF Z(P-l) ,.ASC( ) TIfEN 180 
170 LET Z(P)=Z(P)+D 
180 SUBEND 
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SHELGAME 



i . 



100 REM PROGRAMMED BY HAROLD KAPLAN 

110 REM THIS RATHER TRIVIAL PROGRAM MERELY READS IN "KWIC", CALLS 
120 REM FOR A SORT, SCRATCHES "KWIC", AND WRITES OUT THE RESULT 
130 REM INTO "KWIC" • 

140 FILE #1:"KWIC" 

.150 DIM X$ (3000) 

160 LET J=J+1 
170 LINPUT #1:X$ (J) 

180 IF MORE U THEN 160 
190 CALL "SHELSORT" :X$ () ,J 
200 LIBRARY "SHELSORT" 

210 SCRATCH #1 

220 MARGIN #1:4000 • ' 

230 FOR K=1 TO J 
240 PRINT #1:X$(K) 

250 NEXT K 

260 PRINT J; "ITEMS SORTED. Now go to FOR-LIST or REV-LIST" 

270 END 
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SHELSORT 



100 REM PROGRAMMED BY HAROLD KAPLAN 

110'REM- THIS IS THE-FAMOUS SORT BY SHELL'S ALGORITHM. N IS HOW 
120 REM STRINGS THERE ARE IN THE ARRAY X$(). THEY ARE SORTED IN 
130 REM ASCENDING ORDER. 

140 SUB "SHELSORT" :X$() ,N 
150 LET H=INT (N/2) 

160 IF H=0 THEN 400. 

17 OT FOR W=1 TO H 
180 LET T=-l 

190 FOR J=W TO N-H STEP H 
200 IF X$(J) ,=X$(J+H) THEN 250 
210 LET T=+l 
220 LET Q$=X$(J) 

230 LET X$ ( J) =X$ ( J+H) 

240 LET X$ (J+H) =Q$ 

250 NEXT J 

260 IF T,0 THEN 370 

270 LET T=-l 

280 FOR J=W+H*INT((N-W)/H) TO 1+H STEP -H 
290 IF X$(J-H) ,=X$(J) THEN 340 
300 LET T=+l 
310 LET Q$=X$(J) 

320 LET X$(J)=X$ (J-H) 

330 LET X$ ( J-H) =Q$ 

340 NEXT J 

350 IF T,0 THEN 370 

360 GO TO 180 

370 NEXT W 

380 LET H=INT (H/2) 

390 GO TO 160 
400 SUBEND 
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FOR-LIST 

100 REM PROGRAMMED BY HAROLD KAPLAN 
110 REM This reads in "KWIC", reformats, and prints 
120 REM the results. It calls FCAPS. Don*t forget to 
130 REM change L and R to agree with FOR- KWIC. 

140 print "in which file are the results to be stored” 
150 INPUT S$ 

160 FILE #2:S$ 

170 SCRATCH #2 
180 MARGIN #2:130 
190 LET L=55 
200 LET R=57 

210 LIBRARY "FCAPS" • ' 

220 MARGIN 4000 
230 FILE #1: "KWIC" 

240 LINPUT #1:X$ 

250 LET A$=SEG$ (X$, 91, 91+8-1) 

260 LET B$=SEG$ (X$ ,91+8 , 91+8+L-l) 

270 LET C$=SEG$(X$,91+8+L,91+8+L+l+R-l): 

280 LET D$=SEG$ (X$,91+8+L+l+R,LEN(X$) ) 

290 LET D=VAL(D$) 

300 LET E$=SEG$(B$&C$,L-1,L+D), ' 

310 CHANGE E$ TO Z ' . 

320 DIM Z (290) 

330 CALL "FCAPS" :Z() ,3, (ASC(a)-ASC(A) j 
340 CHANGE Z TO E$ 

350 LET E$=SEG$ (E$,3,LEN (E$) ) ' 

360 IF E$— E2$ THEN 430 
370 IF F, . 1 THEN 400 
380 PRINT #2 

390 GO TO 410 / 

400 PRINT #2: TAB (L) ;F 
410 LET F=0 
420 LET E2$=E$ 

430 LET P=0 ' ' ; • - 

440 LET H$="" v ■’ 

450 LET P=POS (B$,CHR$ (8) ,P+1) 

460 IF P=0 THEN 490 
470 LET H$=H$&" " 

480 ,GO TO 450 
490 LET B$=H$&B$ 

500 PRINT #2: B$;TAB(L+1) jC$ jTAB (L+R+3) ;A$ 

510 LET F=F+1 
520 IF MORE #1 THEN 240 
530 PRINT #2: TAB (L) ;F 
540 END 



O 

ERIC 



71 



64 



REV-KWIC 



100 REM PROGRAMMED BY HAROLD KAPLAN 

110 REM THIS IS THE CONTEXT-MAKER FOR REVERSE CONCORDANCES. 
120 REM DON'T FORGET TO CHANGE R AND L HERE AND IN REV-LIST. 
130 LET L=40 

140 LET R=40 . " ; 

150 LET L9$="" 

160 FOR J=1 TO L 
170 LET L9$=L9$&" " 

180 NEXT J 
190 LET R9 $= " " 

200 FOR J=1 TO R 
210 LET R9$=R9$&" " 

220 NEXT J ‘ v:.:, . ; ■' 

230 PRINT "Input file, Language, Output file" 

240 INPUT A$,B$,C$ 

250 FILE #.1:A$ : 

260 FILE #2 :B$ >< . 

270 FILE #3: C$ 

280 SCRATCH #3 
290 MARGIN #3:4095 
300 LINPUT #2 :B2$ 

310 CHANGE B2$ TO ’B • 

320 DIM B(100) 

330 FOR J=1 TO B(0) • 

340 LET V(B(J))=1 
350 LET W(B(J))=J 
360 LET T(B(J))=1 
370 NEXT J 

380 DIM V(128) ,W(128) ,T(128) 

390 LINPUT #2:B2$ 

400 CHANGE B2$ TO B 
410 FOR J=1 TO B(0) 

420 LET V (B (J))=l 

430 LET W(B(J))=J ! 

440 NEXT J 

450 LINPUT #2 :B2$ ' 

460 CHANGE B2$ TO B 
470 FOR J=1 TO B(0) 

480 LET D(B(J))=J 

490 NEXT J V . 

500 DIM D ( 128) • . - 

510 IF MORE #1 THEN 540 

520 PRINT "EMPTY INPUT FILE" : J 

530 STOP ' >V ‘ 

540 LINPUT #1:L2$ . \ . 

550 LET L$=FNC$(L2$) , 'V- 

560 IF FNL$(L$)="T" THEN 590 
570 PRINT "FIRST LINE IS NOT LABELS* ;L2$ 

580 STOP 
590 LET L3$«L$ 



REV-KWIC (continued) 

600 LET L4$=L3$ 

610 IF LEN ( L4 $ ) . 0 THEN 640 

620 PRINT "DONE - Now go to SHELGAME" 

630 STOP 
640 LET C$="" 

650 IF END #1 THEN 730 
660 LINPUT #1:L2$ 

670 LET L$=FNC$(L2$) 

680 IF FNL$(L$)="T" THEN 710 
690 LET C$=C$&" "&L$ 

700 GO TO 650 
710 LET L3$=L$ 

720 GO TO 750 
730 LET L3$=" " 

740 GO TO 750 

750 CHANGE L9$&C$&R9$ TO Z 
760 FOR J=1 TO Z(0)/2 
770 LET T9=Z ( J) 

780 LET Z(J)=Z (Z(0)+1-J) 

790 LET Z (Z (0) +1-J) =T9 • ' 

800 NEXT J 

810 FOR J=2 TO Z(0)-1 
820 IF Z(J) , . 8 THEN 860 
830 LET T9=Z ( J+l) 

840 LET Z(J+1)=Z(J-1) 

850 LET Z(J-1)=T9 

860 NEXT J 

870 FOR J=1 TO Z (0) 

880 LET Y(J)=ASC(B)+V(Z(J))* (ASC (N) -ASC(B) ) 
890 NEXT J / . 

900 LET Y ( 0 ) =Z (0) 

910 CHANGE Z TO Z$ 

920 CHANGE Y TOY? 

930 DIM Z(4095) ,Y(4095) 

940 LET P=0 

950 LET P=POS(Y$,"N",P) 

960 IF P=0 THEN 1060 

970 CALL "RKEY":Z() ,W(),T() ,D(),P,K$,Q 
980 LIBRARY "RKEY" , "RCAPS" 

990 PRINT #3:K$&L4$&SEG$(Z$,P-L,P+R)&STR$(Q) 
1000 LET P=POS(Y$,"B" ,P.) 

1010 IF P=0 THEN 1060 
1020 IF Z(P) , .8 THEN 1050 
1030 LET P=P+2 

1040 GO TO 1000 ' 

1050 GO TO 950 
1060 GO TO 600 
1070 DEF FNC$(A$) 

1080 LET A=POS(A$," ",1) 
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REV-KWIC (continued) 

1100 LET FNC$=SEG$(A$,A+1,LEN(A$)) 

1110 FNEND 

1120 DEF FNL$(A$) , 

1130 IF LEN (A$ ) , . 8 THEN 1210 
1140 LET A=POS (A$ , " ",1) 

1150 IF A. 0 THEN 1210 
1160 FOR J=1 TO 8 
1170 IF SEG$(A$, J,J) .=*CHR$(97) THEN 1210 
1180 NEXT J 
1190 LET FNL$="T" 

1200 GO TO 1220 
1210 LET FNL$="F" 

1220 FNEND 
1230 END 
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RKEY 



100 REM PROGRAMMED BY HAROLD KAPLAN 

110 REM THIS IS THE SORT-KEY MAKER FOR REVERSE CONCORDANCES. 
120 REM IT IS CALLED BY REV-KWIC. K$ IS THE RESULTING KEY, 

130 REM AND Q IS THE LENGTH OF THE WORD FOUND. 

140 SUB-"RKEY W :Z() ,W() ,T() ,D() ,P,K$,Q 
150 CALL "RCAPS" : Z() ,P, (ASC (a)-ASC (A) ) 

160 LET F$="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" 
170 LET J=1 

180 FOR I=l+P-1 TO 30+P-l 

190 IF Z(I)=8 THEN 260 

200 IF W(Z(I))=0 THEN 300 

210 LET A( J)=W ( Z (I) ) +64 'LETTER 

220 LET B (J)=64 'DIACRITICAL MARK 

230 LET C ( J) =T ( Z (I) ) +64 'CASE • 

240 LET J=J+1 
250 GO TO 280 

260 LET B ( J-l) =D ( Z (1+1) ) +64 
270 LET 1=1+1 
280 NEXT I 
290 LET 1=1+1 

300 LET A(0)=B (0)=C(0)=J-1 : 

310 LET Q=I- (1+P-l) 

320 CHANGE A TO A$ 

330 CHANGE B TO B$ 

340 CHANGE C TO C$ 

350 LET K$=FNS$ (A$) &FNS$ (B$) &FNS$ (C$) 

360 LET J=1 

370 FOR I 2=1 TO 30+P-l 

380 IF Z (12 ) = 8 THEN 440 ■ v'," • . ' 

390 LET A(J)=W(Z (12) ) +64 » 'LETTER 
400 LET B (J)=64 'DIACRITICAL MARK , 

410 LET C ( J)=T( Z (12) ) +64 'CASE 

420 LET J=J+1 V ■ ' ' 

430 GO TO 460 ' 

440 LET B (J-l) =D (Z (12+1) ) +64 
450 LET 12=12+1 

460 NEXT 12 ' 

470 LET A ( 0 ) =B ( 0 ) =C ( 0 ) «= J-l 
480 CHANGE A TO A$ 

490 CHANGE B TO B$ 

500 CHANGE C TO C$ 

510 LET K$=K$&FNS$ (A$) &FNS$ (B$) &FNS$ (C$) 

520 DIM A ( 30 ) , B ( 30 ) ,C ( 30 ) 

530 DEF FNS$ (X$) =SEG$ (X$&F$,1,15) 

540 SUBEND ‘ r 
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RCAPS 



100 REM PROGRAMMED BY HAROLD KAPLAN' 

110 REM This is needed by REV-KWIC and REV-LIST. It looks 

120 REM for starts of sentences marked by two spaces , and 

121 REM chanqes capitals to lower-case. 

130 SUB "RCAPS": Z () r P r D 

140 IF Z (P+2) , ,ASC ( ) THEN 170 
150 IF Z (P+1 ) , .ASC ( ) THEN 170 
160’ LET Z (P)=Z(P)+D 
170 SUBEND 



REV-LIST 



100 REM PROGRAMMED BY HAROLD KAPLAN 

110 REM THIS REFORMATS "KWIC" FOR REVERSE CONCORDANCES AND PRINTS 
120 REM THE RESULT. DON'T FORGET TO CHANGE L AND R HERE TO AGREE 
130 REM WITH REV-KWIC. 

140 PRINT "in which file are the results to be stored" 

150 INPUT S$ 

160 FILE #2 :S$ 

170 SCRATCH #2 

180 MARGIN #2:130 ^ : 

190 LET L=40 
200 LET R=40 

210 LIBRARY "FCAPS" 'YES IT IS FCAPS, NOT RCAPS. 

220 MARGIN 4000 • : • . .. 

230 FILE #1: "KWIC" 

240 L INPUT #1:X$ 

250 LET A$=SEG$(X$, 91, 91+8-1) 

260 LET G$=SEG$(X$,91+8,91+8+L+l+Rrl) : 

270 CHANGE G$ TO G 
280 FOR J=1 TO G(0)/2 
290 LET T9=G(J) 

300 LET G ( J) =G(G (0 ) +1-J) ' 

310 LET G(G(0)+1-J)=T9 

320 NEXT J 

330 CHANGE G TO G$ 

340 LET B$=SEG$(G$,1,R+1) 

350 LET C$=SEG$(G$,R+2,R+L+1) 

360 DIM G (300 ) 

370 LET D$=SEG$ (X$ , 91+8+L+l+R,LEN (X$) ) 

380 LET D=VAL(D$) 

390 LET E$=SEG$ (B$ , LEN (B$) — D+l-2 ,LEN (B$) ) 

400 CHANGE E$ TO Z 
410 DIM Z (290 ) 

420 CALL "FCAPS M :Z() ,3, (ASC (a)-ASC (A)) 

430 CHANGE Z TO E$ 

440 LF.T E$=SEG$(E$,3,LEN(E$) ) 

450 IF E$=E2$ THEN 520 
460 IF F, .1 THEN 490 
470 PRINT #2 
480 GO TO 50" 

490 PRINT #2 i TAB (L— 1) ;F 
500 LET F=0 
510 LET E2$=E$ 

520 LET P=0 
530 LET H$=" " 

540 LET P=POS (B$,CHR$(8) ,P+1) ' ; ' 

550 IF P=0 THEN 580 

560 LET H$=H$fit" ■ . ’ V ' 

570 GO TO 540 
580 LET B$=H$&B$ 

590 PRINT #2: B$;TAB (L+2) ;C$ ;TAB(L+R+4) ;A$ 



PRINT-OUT KWIC 
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